LI  B  RAR.Y 

OF   THE 

U  N  IVERS  ITY 

or    ILLl  NOIS 

370 


'   1 


Return  this  book  on  or  before  the 
Latest  Date   stamped  below. 

University  of  Illinois  Library 


etc  2  2 

Wif  2  5  1*1 


L161— H41 


Digitized  by  the  Internet  Archive 

in  2012  with  funding  from 

University  of  Illinois  Urbana-Champaign 


http://www.archive.org/details/similaritybetwee02cron 


BUREAU  OF  RESEARCH  AND  SERVICE 
College  of  Education 
University  of  Illinois 
Urbana,  Illinois 


SIMILARITY  BETWEEN  PERSONS  AND  RELATED  PROBLEMS  OF 

PROFILE  ANALYSIS 


Study  performed  under  Contract  N6ori-07135 
with  the  Bureau  of  Naval  Research 


Lee  J.  Cronbach 
University  of  Illinois 

and 

Goldine  C  Gleser 
Washington  University 


Technical  Report  No.  2 
April,  1952 


THf  LIBRARY  OF  THE 

AUG  1 1 1C5? 
UNiv:r.:i7ir  of  Illinois 


SHiILARITY  BE'RVLLN  ?ERSOI^[S  /.I©  RI^.UTED  PROELii3:3  C?  PROt^lLi:  ANALYi;!^^^ 
Lee  J,  Cronbach,   Brreau  of  hesearch   rnd  Se::-^ice.   College  of  ICciU'^ ation. 

University  of  Illinois 
Goldine  Cc  Gleserc,   DepartiT-ent  of  Ueuropsycliiatry,   School  of  hedicine, 

VJaship.c-ton  UrdversitT 

Studies  of  personality ' and  behavior  are     turning  increasingly  to  a  simul- 
taneous consider aoion  of  sever ol  traits  or  characteristics,    arid  a  great  msn^''  in- 
vestigations attenpt  to  deal  irith  profiles  or  patterns  of  scores© 

In  this  paper  -je  bring  together  tlxe  procedures  'jhich  may  be  used  for  describ- 
ing relations  between  such  patterns  of  multiple  scores.,     A  cor,p arisen  of  these 
possible  treatments  leads  to  recorriiTiendations  for  iirrj-iroved  procedures  in  fut'ore 
investigations  of  sii'iilarity  betvjeen  persons » 

The  tj'^re  cf  research  on  Hhich  our  results  bear  csn  be  iilust:rL.ted  by  refer- 
ence to  several  recent  studies.     One  is  the  efCort  by  Kell^/  and  i^lske  (22}   to 
validate  certain  predictions  made  in  the  VA  study  of  clinical  ps^rcholo^ists.     They 
compax-ed  profiles  cf  assessors'   ratings  with  profiles  of  criterion  ratings.     Many 
studies  concerned  with  classify ing  patients  on  the  basis  of  "Jechsler-3ellevue  pro- 
files have  studD.ed  the  sirrdlarity  cf  patterns  of  scores^    and  Barnette   (  i)  has 
compared  ps7/chor.etric  profiles  of  occupational  .groups© 
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Other  investigators  have  been  interested  in  the  possibilities  of  "inverse  factor 
analysis",  as  introduced  by  Burt  and  developed  by  Stephenson,  In  Stephenson's  hands, 
the  so-called  Q~technique  i3h)   has  been  app3.ied  widely  to  the  study  of  similarity 
between  persons,  and  to  the  identification  of  types  of  persons.  Fnedler  and  others 
(17^  18)  have  used  the  method  not  only  to  compare  one  person  to  another,  but  also 
to  compare  various  perceptions  by  the  same  person.  An  example  is  the  experiment   in 
which  A  describes  himself  along  many  dime.nrjions,  A  predicts  how  B  ;jili  descrjbs  him- 
self, and  then  B  descr5.bes  himself.  Three  comparisons  are  possible,  which  might  be 
said  to  indicate  the  "real  similarity"  of  A  and  3,  A*s  "assumed  similarity"  to  B, 
and  A's  "insight"  into  B.  In  addition  to  the  foregoing  studies  of  i ^e  eciivalence  of 
one  person^ s  responses  to  another's,  the  statistical  devices  we  consider  are  rele- 
vant to  studies  of  stimulus  equivalence.  Osgood  and  Su.ci  (26,  2?)  for  example,  is 
presently  employing  methods  like  those  we  discuss  to  str.dy  serr.antic  problems  by 
demonstrating  which  words  elicit  similar  association  patterns  under  controlled  con- 
ditions. As  another  exsirrple,  we  find  that  sociometric  data  may  be  treated  so  as  to 
indicate  the  extent  to  which  tvo  group  members  see  the  group  in  the  same  way,  or  so 
as  to  indicate  the  extent  to  which  the  two  persons  are  perceived  in  the  same  way  by 
the  group.  That  is,  we  can  stucfy  the  persons  as  social  pero^ivers  and  alpo  as  per- 
ceived objects.  The  formulas  we  discuss  are  reDevant  to  all  the  foregoing  types  of 
investigation. 

Despite  the  rather  large  number  of  studies  which  employ  statistical  measures 
of  similarity,  there  has  been  no  coinprehensive  analysis  of  the  possible  alternative 
procedures.  In  the  present  psper  we  state  a  general  model  which  clarifies  the 
problem  of  determining  similarity  of  two  score-sets.  Within  that  model  we  compare 
the  many  formulas  employed  to  date,  and  advance  some  proposals  of  our  own.   In 
our  examination  of  procedures,  we  find  that  some  popular  methods,  such  as  the 
procedure  of  correlating  profiles,  have  serious  limitations.  The 
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methods  of  Stephenson  and  DuMas,  in  particular,  magnify  the  errors  of  measurement 
for  some  if  not  most  persons,  and  therefore  are  likely  not  to  detect  some  signi- 
ficant relationships. 

Techniques  to  describe  similarity  between  persons  are  needed  for  investi- 
gating questions  such  as  the  following: 

1.  How  similar  are  Persons  1  and  2? 

2,  How  similar  is  Person  1  to  Group  Y? 

3»  How  homogeneous  are  the  members  of  Group  Y? 

li.  How  similar  is  Group  Y  to  Group  Z? 

$9     How  much  more  homogeneous  is  Group  Y  than  Group  Z?  Than 
combined  sample? 
Comparable  questions  magr  be  asked  in  experimental  studies  regarding  the  two  or 
more  measures  for  the  same  person. 

While  an  index  capable  of  describing  the  degree  of  similarity  between  score 
sets  is  necessary  for  many  of  the  investigations  now  being  pursued,  it  is  often 
equally  or  more  important  to  test  hypotheses  such  ds,  "Group  Y  and  Group  Z  can  be 
regarded  as  samples  from  the  same  population".  The  problems  of  inferential 
statistics  relevant  to  similarity  measures  have  been  thoroughly  studied  by  Fisher, 
Hotelling,  and  the  Calcutta  school.  The  necessaiy  significance  tests  and  distri- 
bution functions  are  available  for  normally-distributed  variables,  and  have  re- 
cently been  summarized  in  a  most  helpful  review  by  Hodges  (20).  We  shall  not 
discuss  the  inferential  problems,  being  concerned  in  our  treatment  solely  with 
the  descriptive  formulas  for  reporting  degree  of  similarity, 

A  General  Model  and  Notation 

■ ^      — -  -        -        I         II  ■         HI      II    I       I  T         r  I  I        IB^MIIIM      II      !■ 

A  profile  or  pattern  pertaining  to  a  person  consists  of  a  set  of  scores. 
We  shall  use  the  following  notation: 
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3        ansy  of  the  variates  a,  b,  c,  which  are  k  in  number 

i        any  one  of  the  persons  1,  2,  •..••  N 

X,  y,  Z   classes  of  persons 

x.^      the  score  of  person  i  on  variate  j 

Considering  only  two  persons,  we  have  the  set  of  x.  (x^f   ^bl'  *•••  ^kl^  ^^^ 
person  1,  and  the  set  of  x.p  for  person  2«  Without  placing  any  restriction  upon 
our  data,  we  may  regard  the  x   as  the  coordinates  of  a  point  P-,  in  k-dimensional 
space.  The  x^2  define  a  point  Pp.  TVhen  the  variates  are  independent  they  are 
properly  represented  by  orthogonal  axes,  whereas  correlated  variables  are  more 
appropriately  represented  by  oblique  axes.  As  two  profiles  become  more  similar, 
the  points  representing  them  fall  closer  together.  Accordingly,  we  define  the 
dissimilarity  of  two  score-sets  as  the  linear  distance  between  the  corresponding 
points. 

The  formulas  to  be  presented  in  this  section  apply  to  score-sets  of  mai^y 
types;  viz.,  responses  to  a  series  of  items,  raw  scores  on  a  set  of  tests,  profiles 
of  deviation  scores,  individuals'  ratings  of  a  group  of  stimuli  on  a  subjective 
scale,  or  responses  to  a  Stephenson  forced-sort  procedure.  We  shall  later  discuss 
the  fact  that  in  some  of  the  above,  and  also  in  the  treatment  iu^lied  by  convention- 
al measures  of  correlation  between  persons,  points  are  limited  to  certain  sub- 
divisions of  the  k-space.  The  formulas  given  here  are  as  appropriate  for  these 
restricted  score-sets  as  for  the  unrestricted  case. 

If  we  assume  the  axes  to  be  orthogonal,  the  distance  D  between  any  two 

points  may  be  easily  obtained  from  its  square  by  use  of  the  generalized  Pythagorean 

rule,  j^ 

I^   «    Z   (x   -  X  l^. 

In  subsequent  formulas,  we  shall  often  use  the  symbol  A.  x .  to  refer  to  the  quantity 
in  parentheses.  The  persons  involved  in  the  difference  will  be  obvious  from  the 
context.  This  formula  defines  the  basic  measure  under  consideration  in  this  paper. 
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We  shall  show  that  all  the  formulas  presently  used  in  psychological  research  in- 
volving similarity  may  be  expressed  in  terms  of  formula  (l)  under  certain  stated 
restrictions.  This  points  to  the  fact  that  in  current  practice  correlations  among 
the  variates  are  generally  ignored,  lie   shall  discuss  later  how  formula  (1)  com- 
pares to  the  true  measure  of  distance  when  score-sets  are  correlated.  However,  it 
can  be  noted  here  that  when  intercorrelations  are  uniformly  low,  no  serious  dis- 
tortion in  the  ordering  of  similarities  among  a  group  of  individuals  is  incurred 
by  the  use  of  the  orthogonal  measure.  On  the  other  hand,  when  one  wishes  to  take 
the  correlations  among  the  variates  into  account,  the  advisable  procedure  is  to 
transform  the  variates  into  an  uncorrelated  set,  in  which  case  (1)  is  fully  appro- 
priate. Formula  (1)  and  its  derivatives  therefore  promise  to  be  suited  for  most 
psychological  investigations  of  profile  similarity. 

With  any  one  set  of  tests,  the  two  most  similar  persons  id.ll  have  the 
smallest  separation,  D,  and  also  the  smallest  D  •   If  the  two  persons  have  identi- 
cal score-sets,  D^  equals  zero.  If  scores  on  any  variate  can  range  from  -oo  to 

2 

00,  as  would  be  the  case  with  normally  distributed  variates,  D  and  D  can  increase 

without  limit.  However,  the  large  values  have  only  an  infinitesimal  probability. 
D  and  D^  result  in  identical  ordering  of  individuals  with  respect  to  dissimilarity. 

A  particularly  interesting  distance  in  k  space  is  that  from  the  centroid  of 
a  population  to  any  particular  point  P. •  This  distance,  which  we  shall  call  the 
eccentricity  of  an  individual  (E^^),  is  obtained  by  the  formula: 


/  ^   ,..     =  .2 


1  =v'  ^^(^ji-^a.)   .  (2) 


The  expected  value  of  E  for  the  population,  that  is,  the  dispersion  of  all  the 
points  about  the  centroid,  is  given  by 

E?  -  2  a2  (3) 
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where  o^  is  the  standard  deviation  of  variable  J  in  the  population. 

Since  D*^  depends  on  the  nuitiber  of  variates  in  the  set   and  on  the  size  of  their 
units,   it  is  frequently  unsuitable  for  malcing  comparisons  from  one  score-set  to 
another.     A  measure  of  distance  in  standard  units  is  therefore  required.     The  dis- 
persion of  the  points  in  a  population  provides   a  "yardstick"  for  this  p-orpose,   since 

2 

the  expected  value  of  D*^  is  just  tvrice  the  dispersion  of  the  population.     That  is. 


3-5  -:  I     =       2E7         -       2Z  o^   . 

J 


...       -         .1..  -         .^OT    .  (i^) 


I'l's  standard  index,  which  we  call  S,  is  defined  by  the  equation:  ■ 


^.  2  2 

2^       ^'  \2  ^12- 

^12   '  -^ -  — 1^=-  .   (5)  •: 

■     .  '     D-^it  2  E^    .  . 

When  the  measures  used  in  a  pattern  of  scores  have  been  standardized  on  a 
large  sample,    as  is  true,   for  exainple  for  the  Bellevue-VJechsler  subtest  scores,   . 
then  the  standard  deviations  of  such  a  reference  group  may  be  used  to  determine  E  • 

If,   however,   the  only  data  available   are  those  for  a  relatively  small  sample,    then 

2 

'the  best  estimate  of  a.     for  the  Toopulation  is 

.  ^   . 

2 

est      a-.       »  N  V.       ,  (6) 


3 


N  -  1        ^ 


T-rhere  V.  is  the  obtained  variance  in  the  sample. 


est     E         =    rri-  •      J      0    .  (^) 


This  is  the  value  used  in  obtaining  the  standard  index  S^  , 

2     ■   o 
S  ,  like  D%  can  range  from  zero,  for  identical  score-sets,  to  infinity, 

2 
In  the  population  used  for  reference,  the  mean  of  all  S    is  1,  The  large  values  • 

ii ' 
2  •■•■.-. 

of  S  are  decreasingly  frequent,  and  for  most  types  of  distribution  the  probability 
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of  large  vaU.ues  is  irfiritesir'".?!* 

Correj.ated  v;.ri'-.tt?g    ^:d  "he  o'cliq'ue  rricdcl.     Tr.e  f criT'.J.  ns  preatrtod   above  ignore 
any  correlation  £'^i^v:^  tlie  \'ci%-^£'tcs,     T;ie  variates  have  b-^^;:   I'^preo^iirlied  in  nn 
orthogon?vl  model,  -.:±th  eao   .dXic  porpendic-Jilar  .to  tl.c  ire-wSt,      Ji   corrciircion  T.'ere 
taken  into   rcriount,   the- v;:;ri axes  would  bs  treated  as   a  sot  of  obliquely  Inclined 
axes,   Such  th?.t  ti'S   angle  between  £«€?  v;o'-iid  be  small  for  hin:hly  correlated  variates. 
Such  an  oblique  model  is  u'sc-d  inniathcmatical  statistics,   because  it  tdces  into 
ecco^-int  oil  the   av-^ilable  inf 6 rniat;:  on.     This  obliq^Jie  rr;6del  undei-lies  tlie  development 
of  the  discrirninvnt  fimction,    the  Ko telling  T  test,  r.nd  the  .generalized  distance 
iisasure  ol  HohL->lanobis   (see  IlDdges   (20).   Rao   (30) ♦    "^^^^  therefore  examine  lio'j  a  dip- 
tance  mec^ure  based  on  the-rtore  comprehensive  oblique  niodci  differs  froiri  that  ob- 
tained through  the  .orthogonal  '.iicUei.  • 

The  problera  which  confronted  l'iri-.alanobis,    and  others  who  have  used    his  tech- 
nique,  v:^!^  that  of  detcrrdning  the  clistt-nce  between  tt/o  groups,  raec-isu re  has   been 
used  particvJ.arl7  in  ahthropolcgical  research,  uhere  the' purpose  is  to  studj^"  the 
similorrcy  of  racial  and  tribal. ^ro'.tps- on.  phySiical  ir;e asurements.     His  iormu].a  is 
usually  T.'ritten  in  the  following;  form '  (usiiu,'  the.  block  4D   to  distinc^uish  this 
measure  f ron  our  D):  "'  - 

1)2     ^     2  .2  (X^^  dj.  d,    • 


To   avoid  confusion,   vre  can  rcv-rrite  this  in  a  notation  consistent  with  ours: 

2  2 


9  ..., 


=  2  2,rf^^/^x^^x  (Q^ 


Here  C\         ^^  ^'1-^  jj'   "Sloment  of  the  inverse  of  the  coriibincd  vathin-grcuc  covariance 
matr:jc  Q^  .,    •       As  Hah ala'iobis  develops  the  probj.eri],'  he  deal.s  with  differences  be- 
tween  group  mea^is,   but  lornula  (3)   can  '.also  be  irterprcted  as  related  to  -'Jie  dif- 
ference between  individ..als.,  .The  h^halanobis  fornula  gives   the  swie 
result  as  wo--J.d  be  obtfdned  if  the  original' vari at es  were  .sT,a::.c!ai'di:::ed,    and  then  axes 
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were  rotated  to  any  orthogonal  set  of  vsriates  so  that  formuXa  (1)  could  be  applied. 
In  fact,   since  tlie  computations  required  by  (8)   are  impractical  for  more  than  a 
few  variates,   the  usual  method  of  dealing  with  an  oblique  space  is  to  make  such  a 
transformation  and  apply  (l)»     Rao  (30)  suggests  one  transformation  (out  of  many 
possible),  x^rhich  is  relatively  easy  to  apply.     Suppose  we  begin  with  variates   a, 
b,   c,..,  expressed  in  standard  measure,    and  seek  an  orthogonal  set  a,  b  ,   c  ,.#. 
These  equations  may  be  used: 


a       =     a 

0 


b       =         ^  -  ^^ab 
o  


VT^r^ 


ab 


c 
o 


c  •  ar       -  b  r 
ac         o  cbp 

Vi  .  r2     -  t\ 
ac       "•  cbo 


etc* 


This  transformation  defines  b     as  the  portion  of  b  not  predicted  from  a,    and  c 

o 

as  the  residual  in  c  not  oredicted  from  a  and  b  •  Then  D'"  determined  from  a^, 

o  o^ 

b  ,  ...  is  identical  to  D'"  determined  from  a,  b,  •••  • 

2        0 
We  may  note  several  important  properties  of  3D  ,  or  of  D  obtained  from 

standardized  and  transformed  variates.  This  measure  has  a  kno>m  distribution  and 

thus  fcrm.s  the  basis  for  testing  the  significance  of  the  difference  between  groups <, 

It  may  also  be  used  to  determine  whether  additional  variables  add  significantly 

to  the  discrimination  between  groups.  Moreover,  D ^  is  closely  related  to 

Fisher's  discrirdnant  function,  and  particularly  to  the  proportion  of  individuals  • 

classified  into  the  wrong  group  by  the  most  efficient  possible  discriminant 

function]  it  is  therefore  a  measure  of  the  efficiency  of  classification.  One  of 

the  striking  features  of  ID  is  that  all  orthogonal  components  in  a  set  of  variables 
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2 
have  equal  weight  in  the  measure.  The  consequences  of  this  if  3D  is  used  as  an 

index  of  similarity  between  individuals  require  discussion. 

In  any  set  of  variates,  some  variance  is  likely  to  be  due  to  general  qualities 
or  factors  represented  in  several  variates,  some  is  due  to  common  factors  present 
in  only  a  few  variates,  and  some  is  due  to  factors  found  only  in  a  single  variate. 
This  unique  variance  may  be  due  to  real  traits  specific  to  a  single  test,  or  it 
may  represent  error  variance.  Ordinarily,  when  we  wish  to  investigate  similarity 
in  a  domain,  we  are  concerned  with  general  qualities  found  among  a  population  of 
variables,  rather  than  with  characteristics  defined  by  a  single  sample  of  items-, 
We  would  like  the  similarity  index  obtained  to  be  reliable  from  one  sample  of 
items  to  another,  so  that  the  same  pairs  of  people  will  be  reported  as  similar 
on  both  occasions.  This  problem  is  of  greater  importance  in  psychological  work, 
where  the  number  of  variates  is  unlimited  and  some  correlations  between  them  are 
low,  than  in  anthropological  work,  where  variates  are  accurately  measured  and 
highly  intercorrelated,  and  the  total  domain  under  study  is  relatively  restricted. 
Stability  of  the  similarity  index  from  one  set  of  variates  to  another  demands  more 
consideration  in  measuring  similarity  of  individuals  than  in  measuring  similarity 
of  groups.  Unreliable  factors  will  not  discriminate  appreciably  between  groups 
and  therefore  will  not  influence  ]D^  between  groups. 

Now  the  Mahalanobis  measure,  which  is  designed  primarily  to  capitalize  on 
separation  of  groups  in  any  reliably  measured  factor,  assigns  equal  weights  to  all 
factors,  whether  they  be  general  or  unique.  In  a  set  of  physical  measurements,  it 
would  assign  equal  weight  to  such  factors  as  height,  breadth  with  height  constant, 
and  so  on.  If  D^  were  applied  to  measuring  the  distance  between  two  individuals 
on  the  Wechsler-Bellevue  test,  one  factor  might  be  general  ability,  and  a  second 
factor  might  be  an  element  common  among  the  verbal  tests.  If  there  were  ten  scores 
in  the  profile,  however,  there  would  be  eight  other  independent  factors  extracted 
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and  assif^ned  equal  -jeighto     Host  of  thece  Tjoald  te  specific  to  particiolar  tests, 
.and  nzny  ci  them  irould  be  prinarily  loaded  vrith  error  of  lueasurenent.     Hence  3D 
would  ass j. 2^1  as  Tiuch    Tienslit  to  differences  in  these  unimportsnt  and  perh^s 
meaningless  factors   as  to  the  general  factor*     This  means  th'at  for  particular  pairs 
of  persons  D*-  will  be  unreliable  from  trial  to  trial  and  from  one  set  of  tests 
tD  another  set  chosen  from  the  same  general  domai.n,'.'      _        . 

/.  satisfactory  sol'iition  vihich  also  t alee s  into  .account  the  correlation  among 
the  original  variates  is  to  assign  .i7ei;;^hts  to  the  transformed  variates  de?uiberately- 
Z^ch  a^,  b  ,   c  ,.,.,can  be  ais signed  a  weight  according  to  its  apparent  iJiiportance^^.  . 
before  formula  (1)  ds  applied.     This  wo".ild  be  especially  .feasible  if  the  orthcgcn:. j  ■   • 
vai'iates  were  based  on  a  factor  analysis,   so  that  the  investigator  knoirs  wliich 
•scores  represent  important  general  qualities,    and  which  are- unimportant  residuals^ 
If  the  investigator  knows  that  the  Wechsler  profile  contains  only  four  factors  he" 
wishes  to  weight  in  the  similarity  index,  h'e  can  assign  zero  weight  to  the  unim-.' 
portant  and  unreliable  factors.  .It  is  certainly  troublesome,  however,   to  trans-  ,. 
xorm  variates,  especially  if  factor  scores  must  be  estimated*     Generally,    a  wiser '• 
procedure  is  for  the  investigator  to  make  his  initial  measurements  on  a  set  of  ■..    "•  '• 
variates  which  ore  neorly  uzicorrelated,    and  each  of  which  is  important  and  reliably 
measured.     This  requires  care  in  the  original  planning  of  an  investigation,  but 
once  such  a  set  is  employed,   form.ula  (1)   applies  directly,    and  the  similarity  ;■ 

measures  obtained  would    be  generally  stable  if  a  second  set  of  instruments  measiu'- 
ing  the  sar.:e  factors  were   applied  to  the  some  people. 

■    V/e  may  next  examine  what  hs^pens  if  D^  is   applied  directly  to   a  set  of  Go;r''  • 
stcndai'dized  correlated  variates,  with  the  correlation  not  being  taken  into-' Sec ovrnt. 
This  in  effect  tal<es   a>:es  which  are  oblique  to  each  otlier  and  stretches  tne  space 
to  place  them  pei-pendicular  (Fig.  1),     If  we  express  the  resulting  measure  in 
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terms  of  the     variates   a  ,  b  ,   etc»   considered     above. 


0^ 


d2  -  A2   a  -^  (1  -  r|b)  A  2  b^     +     r^^  A^   a  -^  2  r^^  ^/  1  .  r|^  A  a  Ab^ 

+       (1  -  r?     -  r^^  )  A^c^     +    rL  A^a  +  r?,     A^b     +  2  r     r  ^     A  aAb 
^  ac         cb^'  o  ^  cb^  o  ac  cb  o 


(10) 


Here  it  is  apparent  that  some  factors  are  weighted  more  heavily  than  others.     If 
we  collect  terms,  we  find  coefficients     as  follows: 


A^  a:l+r^^     +    r        +...  (k  terms) 

ab  ac 


A^  ^0=     (1  -  ^Ib)     ^  ^cb^     •*•  ^Lq     ■"•••  ^^  "  ^  ^^""""^^ 


A^  c   :     (1  -  r2     -  r\   )     +  t\  +...  (k  -  2  terms) 

o  ab         cb^  dcQ 


etc. 


^^^V     2r,b^l-4    *    2r^r  *  2  r^  r^      *  ...  (k  -  1  terns) 

o  o 
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Since  by  definition  a^  b  ,  c  ,  ••••  are  orthogonal,  any  cross-product  terms 
summed  over  all  pairs  of  persons  in  a  sizeable  sanple  approaches >  zero.  Hence 
in  considering  the  weights  of  the  various  factors,  on  the  average,  we  may  disregard 
these  terms.  Then  the  coefficients  of  the  square  terms  indi.cate  the  weights  of  the 
various  factors.  If  the  -original  variates  are  left  :>n.th  unequal  variance,  those 
variances  also  would  affect  the  weights.  '(For  example,  the  terms  in  /\^a  would  be 

a,  +  o^    r^,  -+ d^  i^  ii^»*.Y, 

A     B   ab    C   ac   ■  **•''• 

It  is  immediately  evident  that  a  factor  which  appears  in  several  of  the  ori- 
ginal varirtes  receives  greater  weight  in  D  than  a  factor  which  appears  in  just  a 
few  variates.  In  particular,  a  unique  factor  receives  relatively  little  weight, 
and  for  that  reason  D  will  be  more  stable  than  D  from  one  trial  to  another  or 
from  one  set  of  tests  to  another.   =. 

i-'or  sny  particular  pai-r  of  individuals  it  is  impossible  to  evaluate  the  exact 

■  weight  of  the  various  factors  resulting  from  the  use  of  D.  This  is  due  to  the 
fact  that  the  cross-product  terms  contribute  to  the  weight  of  each  factor.  6ince 

'the  product  of  differences  may  be  either  positive  or  negative,  the  factors  make  a 

different  contribution  to  D  in  each  pair  of  persons.  IJhen  a  large  number  of 

variables  are  involved,  these  terms  vjill  tend  to  cancel  out  so  that  their  sum 

approaches  zero.  Even  so,  it  cannot  be  assumed  for  any  given  pair  of  persons  that 

the  crcss«»>prcduct  terms  are -negligible.  Of  course,  the  more  nearly  the  original 

variates  are  uncorrelated  the  less  the  cross-products  influence  the  resulting  D 

measure.  '>' 

■Figure  1  sketches  the  transformations  involved  in  using  D    and  D.ad  measures. 

of .dissimilarity,  for  two  variates  with  substantial  correlation  (  r   *  •70). 

AB 

'  It  is  evident  that  both  procedures  alter  the  distance,  betw&en  points,  unless  we  ^    • 

■  .  ^■^'^ 
•'begin  with  standardized  variates  (in  which  case  D  preseryes  the  distances  im«"ere^ 

altered)  or  begin  with  orthogonal  variates  (in  which  case  D  preserves  the  distances 


^  r^t 


a  '•    t  ^9  9  4 


unaltered)*  In  the  figure  ve  note,  for  example,  that  originally  1  and  h,   are  closer 
together  than  1  and  2;  but  for  both  D  and  ]D  measures,  1  and  k   are  found  to  be 
fai'ther  apart  than  1  and  2» 

The  conclusions  from  our  examination  of  the  relationships  between  ID    and  D 
are  as  follows: 

1,  >rnile  ID  is  the  appropriate  statistic  to  use  in  testing  hypotheses  for 
significance,  it  is  not  a  desirable  descriptive  measure  of  similarity  for  psycho- 
logical work  because  it  places  excessive  weight  on  unimportant  residual  factorsc 

"t  is  relatively  unsatisfactory  for  exploratory  studies  seeking  to  chart  similarity 
:•' illations  in  order  to  formulate  hypotheses ♦ 

2,  D  has  the  advantage  over  ID   that  it  will  tend  to  be  more  stable  from  one 
cample  of  variates  to  another,  but  the  presence  of  cross-product  terms  in  D  makes  its 
psychological  composition  uncertain  for  any  given  pair.  This  same  uncertainty 

cf  factorial  makeup  applies  to  any  other  distance  measure  using  an  orthogonal 
model  when  varirtes  are  correlated, 

3«  If  the  investigator  chooses  his  variables  so  that  each  one 'is' im.portant 
and  so  that  the  set  is  relatively  uncorrelated,  then  D  is  quite  satisfactory  as  a 
descriptive  index,  D  will  be  stable  from  one  set  cf  reliably -measured  variates 
to  another,  provided  the  sets  are  "parallel"  in  content,  i»e,,  designed  to  measure 
the  same  factors.  (If  variates  are  largely  uncorrelated  save  for  a  single  general 
factor,  an  inde::  Dj,  which  \ie   shall  introduce  later  provides  for  an  altered  v:eighting 
of  such  a  general  factor,) 

Pearson's  CRL»  A  precursor  of  the  Mahalanobis  measure  was  Pearson's  co- 
efficient of  racial  likeness  (  28),  which  was  likewise  intended  to  measure  distances 
between  groups.  In  its  original  form,  CRL  was  essential.ly  the  same  as  our  D  ,  save 
that  each  variate  was  expressed  in  str^ndard  form,  and  that  a  multiplier  involving 
the  number  of  cases  per  group  was  included,  A  modified  form  of  CRL  which  allows 
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for  correlation  among  variates  was  also  developed,  but  was  not  used  because  of  the 

computational  labor  required  by  it.  Except  for  the  factor  representing  number  of 

cases,  it  is  essentially  the  same  as  Mahalanobis'  measure. 

Many  of  those  who  tried  to  use  Pearson's  index  in  anthropological  research 

were  dissatisfied  with  it*  The  criticisms  arising  cut  of  its  sensitivity  to  d  1~ 

ferences  in  number  of  cases  from  group  to  group  are  irreDevant  to  oiir  so arch  for 

measures  of  similarity  betv^e^^n  individuals,  Morand,  :ji  discussion  of  a  ppper  b^ 

Rao  (30),  notes  that  the  form  of  CRL  which  ignores  correlations  has  given  mv-eiaTo  > 

able  results  in  some  anthropological  research,  notably  when  the  index  is  deteri.ilv  ;' 

for  groups  which  are  intuitively  or  theoretically  quite  dissimilar.  This  Bpr^epvz 

from  the  context,  to  be  a  consequence  of  the  high  weight  CRL  (like  D)  assign.s  -Lc 

any  general  factor  having  large  loading  among  the  variates.  High  correlations 

are  usual  among  anthropometric  measures,  /.  solution  to  this  difficulty  appears  .--o 

be  an  altered  weighting  for  the  first  component,  such  as  our  D  (see  below)  pro':lda3„ 

w 

Choice  of  ^cale  for  the  measurement  of  dissimilarity  -  Although  we  have  de- 
fined D  and  S  as  measures  6f  dissimilarity,  formulas  have  been  presented  in  terms 
of  IT  and  S  •  It  is  evident  that  for  the  purpose  of  descrirtion  either  th?^  linear 

o 

measures  or  their  squares  could  be  used.  Both  CRL  and  D'  c:cp.  p.:r>r'^s'zc'i  ±a   t.er;,ij 
of  the  square  of  the  distance.  It  should  be  noted,  however,  that  these  measures 
were  developed  to  test  whether  groups  differ  significantly  with  respect  to  the  lineji' 
distahce  between  them* 

Other  metrics  might  also  be  chosen  as  measures  of  s.i.'D.lai'ity,  C^tL?]].  p:d- 
poses  a  transformation  of  S^  such  that  the  values  will  range  from  1  to  -1.  The 
usual  product-moment  correlation  between  persons  may  be  obtained  from  D"  tr   a 
transformation  of  the  form  1  -  cD^,  The  choice  of  an  expropriate  scale  Qep«-" r.d.s 
upon  both  theoretical  and  practical  considerations. 

One  desired  property  of  a  descriptive  index  is  that  it  convey  to  the  reader 
a  sense  01  the  magnitude  of  the  quality  assessed,  D,  which  is  interpretable  93 
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2 
linear  distance,  is  particulaiOy  suitable  in  this  regar-d  whereas  D  has  no  such"  ■  - 

physical  representation.  In  addition,  since  distance  from  a  given  reference  point 
is  measured  on  an  absolute  scale,  one  can  choose  any  particular  individual,  i,  as 
a  reference  point  and  place  all  other  individuals  on  a  scale  relevant  to  him* 
Then  distance  has  the  property  ..    that  an  individual  6  units  away  from  i  is  3 
units  further  or  twice  as  far  as  one  3  units  awac^'  from  i,  etc.  Such  a  concept  is 
operationally  useful  in  the  investigation  of  dissimilarity. 

A  second  desirable  property  is  that  errors  of  measurement  should  be  in- 
dependent of  true  score  and  equal  on  the  average  for  all  pairs  of  persons «  This 
assumption  is  reasonably  accurate,  ordinarily,  for  measures  of  an  individual's 
score  on  any  vallate.  But  when  this  assumption  holds  for  the  single  measures,  the 

distance  measure  between  the  score  sets  of  any  tvjo  persons  does  not  have  this  prn- 

absolute 
pyrty.  Pairs  with  large  D  tend  to  have  larger/error  than  pairs  with  small  D;  this 

-inequality  of  errors  is  considerably  greater  when  dissimilarity  is  measured  in  D'" 

or  S^.  I   closely  related  reason  for  preferring  D  to  D^  is  that  in  determining 

averages,  variances,  and  the  like,  the  square  measure  gives  far  greater  weight  to 

dissimilar  than  to  similar  pairs. 

A  third  desirable  property  is  ease  of  computation.  In  this,  we  find  that  D^ 

and  S  have  considerable  advantage,  A  number  of  simple  formulas  are  available  for 

2 

deter.nining  such  results   as  the   average  D  . for  all  pairs  in  a  group,  without  com- 

2 

puting  each  D  .     No  such  simple  formulas   are   available  for  D. 

Before  reacning  a  conclusion  as  to  the  best  scale  to  employ,   let  us  consider 
Crttell's  transformation  and  indicate  why  we  have  not  chosen  to  use  it.     This  index 
may  be  expressed  in  terms  of  our  measures,    although  Cat tell  .explicitly  restricts  the 
fonaula  to  uncorrelated  variates  expressed  in  standard  deviation  units  with  equal 
variance.     In  his  notation 

^       =     2k  -  Zd^ 


^  2k  +  2d2  (12) 
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Here  k  is  not  the  niiiaber  of  dimensions,    as  in  our  notation,   but  the  median  ciii- 

2 

square  corresponding  to  the  given  number  of  variates,  2d  is  the  ssme  as  our 

2 

2a   X^,   except  for  Cattell's  restrictions  upon  his  variates.     If  we  let  K  be  written 

for  his  2k,   we  can  put  r     in  a  form  for  comparison  with  our  S,  using  Jf  now  for 


number  of  variateso 

»     K  -  2kS^ 
K  +  2k£^ 


r      »    ii-i-Jiiii^      .  (13) 


r     ranges  from  1  to  -1  when  S  ranges  from  0  to  co  •     V/ith  a  large  number     of  veristes, 

Tp  =  0  corresponds  to  S  =  1.     Hence  Cattell's  index  is  directly  related  to  ours, 

but  he  has  compressed  the  large  distances  into   a  small  range  on  the  r     scale.     His 

P 

index  is   also  opposite  in  direction  to  ours»     Cattell's  formulation  is  dictated  by 
a  desire  to  have   a  correlation-like  index,   s^/mmetrically  distributed  about  zero 
trud  ranging  from  1,00  to  -1.00, 

A  correlation-like  index  does  not  appear     as   advantageous   as   a  distance 
iiv^asure.     The  reasons   are   as  follows: 

1,  If  our  data  permit  people  to  be  located  anyii^here  in  a  k-space,   no 
matter  how  f  sr  apart  P.    is  from  Pp  there  is  no  theoretical  reason  why  there  cannot 
be   a  P     such  that  P^Po^  ■^1^2*     ''^^  ^^^  ^°  reason  why  the  measure  of     separation 
should  be  forced  to  converge  toward  a  limit.     "Complete  dissimilarity  of  persons" 

is   an  indefinable  concept,   unless  there  is  some  largest  and  smallest     value  for  each 
variate.     ie   do  not  usually  expect  such  limits  for  traits© 

2.  The  demoJid  for  a  symmetric  measure  seems  unnecessar;^'-;  on  the  contrajy, 
one  might  aiiticipate  thzt  in  a  multivariate  normal  distribution  of  persons  there 
will  be  manjr  very  similar  pairs,    and  relatively  few  pairs  who  are  far  from  each 

other. 

p 

3«     i^ormulas  such  as  xre  have  for  mean  D  ,   etc.,    are  not  possible  with  r  • 

P 

U.     r     lacks  the  usual  properties   and  advantages  of  correlation© 
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In  view  oi   its  operrtional  properties   aiid  its  more  uniform  error  for  all 
pairs  of  persons,  ire  conclude  that  the  linear  raeasure  D  (or  its  standard  form  S) 
is  the  best  descriptive  index  of  dissimilarity  betxjeen  persons.     It  will  be  ncted, 
hc-.jever,   that  D,   D*",   r  ,    and  many  other  transformations  give  identical  results  so 
far  as  the  orderin<^  of  dissimilarity  is  concerned.     The  investigator  who  prefers 
to  stop  with  It  or  S     rather  than  talce  the  square  root  should  use  non-psrair.etric 
statistical  methcds  to   analyze  the  results.     This   applies   also  to  Q  correlations, 
if  those  are  obtained,    and  Duilas'   r     •     Non-parsmetric  procedures  include  com~ 
putation  of  medians,   chi-square,    and  many  significance  tests  which  have  recently 
been  reviewed  by    Hoses   (214)0 

For  correlating  the  dis similar itjr  measure  with  a  criterion,   it  is  advisablo 
to  express  the  measure  in  terms  of  D  or  S   and  apply  the 

product-monent  formula.     It  is  not  wise  to  use  such  procedures  as  computation  of 
means,  product-moment  correlation,    analysis  of  variance,    and  the  t-test  with  Ur  o:? 

« 

S~,  if  the  distribution  is  skewed  appreciably,  Skewness  will  be  small  and  no 

2 
serious  error  will  be  introduced  if  all  S  are  fairly  neai-  to  1,00,  but  this  will 

ordinarily  occur  only  fcr  restricted  t^T^es  of  ds.ta,  IJhile  we  shaG.l  present  short- 

2 
cut  formulas  based  on  mean  D  ,  these  formulas  shoixLd  ordinai'il;^^  be  used  only  in 

rough  comparisons,  where  the  saving  of  time  they  ai'ford  compensates  for  the  fact 

that  tViey  assign  large  weight  to  tne  most  distant  pairs  and  emphasize  errors  of 

measurement  for  such  pairs.  These  formulas  also  are  of  some  use  for  checking  coi^.- 

putations. 

Reduction  of  Data  in  Profiles  using  Derived  ocores.  If  raw  scores  or 

standard  scores  are  entered  in  the  formulas  we  have  been  discussing,  we  examine  aLl 

the  information  about  individual  differences  vrhich  the  data  provide.  This  procedure 

has  been  recommended  by  Cattell  and  by  DuI'Ias  (  1?),  but  has  rarely  been  followed  in 

psycholopical  work.  Instead,  the  more  common  practice  is  to  study  similarity  of 

profile  "shape",  disregarding  differences  in  the  overall  level  of  scores  for  the 


\'^J  .••» 
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person,  or  to  studv  similarity  of  shape  after  equating  profiles  in  terms  of  both 
level  and  variability  within  the  person  (as  in  Q  technique).  In  effect,  investi- 
gators who  choose  procedures  of  this  sort  are  studying  profiles  of  scores  restrict- 
ed to  spaces  of  k-1  or  k^2  dimensions,  respectively.  Such  restriction  may  or  may 
not  be  wise  in  a  particular  investigation.  The  ensuing  section  analyzes  the  various 
methods  by  which  investigators  reduce  the  data  given  them,  so  as  to  give  a  clearer 
picture  of  the  special  effects  of  each  method. 

Each  derived  score  treatment  "projects"  the  scores  into  a  more  restricted 
space,  1.   summary  of  information  about  the  various  treatm.ents,  which  we  shall 
develop  gradually,  is  presented  in  Table  1,  Figure  2  provides  a  series  of  sketches 
to  illustrate  the  discussion. 

If  we  begin  vjith  raw  scores  for  each  person  on  k  variates,  each  person  can 
be  represented  by  a  point  in  a  k-dimensional  space,  as  sketched  in  the  first  panel 
of  Figure  1.  Two  specific  persons,  P  and  Q,  are  located.  The  point  0  is  the 
centroid,  whose  coordinates  are  x^,  Xo  ^  •••  The  point  C  is  the  origin. 

We  shall  now  define  certain  terms  necessary  to  our  later  explanation. 

Elevation  is  the  mean  of  all  scores  for  a  person  (x  ), 
,1 

Eccentricity  is  the  square  root  of  the  sum  of  squares  of  the  individual's 
deviation  scores  from  the  group  means  x.    Geometrically,  it  is  the  person's 
distance  from  the  centroid,  as  shovm  in  the  figure.  If  E.  represents  the  eccentricity 
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D  :   Data  in  k  space 


D*    :   Data  projected  pi^  k  -  1  hyperplane 


D"    :  Data  projected  f rom  k  -  1  hyperplane 
to  k  -  2_hfxjiersphere 


D*    :  Data  projected  from  k  space 
to  k  -  1  l:^rper sphere 


Figure  2,     Projections  implied  by  various  distance  meas\ires 
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Scatter  is  the  sq-J.are  root  of  the  s^om  of  squares  of  the  individual's  deviation 

scores   abcut  his  own  mean  x~  ^       That  is,   it  is    v./k~  times  the  standard  deviation 

ci" 

within  the  j^rofile.  Using  E!  to  represent  scatter,  and  primes  to  represent  scores 
expressed  in  deviate  form. 


E.'  =V  2  (x   -.rT)2      =x/r3c"T2—  (15) 

Shape  is  the  residual  informat?lon  in  the  score-set  after  equating  profiles 
for  both  elevation  and  scatter, 

VJhen  we  change  scores  to  deviations   about  the  person's  mean,  we  develop  new 
scores  x'   ,  which  are  subject  to  the     linear  constraint 

This  removes  from  the  scores   any  information   about  the  person's   average.     Even' 

though  there  are  k  scores  still,   there  ar-e  only  k-1  degress  of  freedom.     Tv^o  people 

whose  scores  in  each  j   are  separated  by  a  constant  amount  have  the  same  profile  of 

x|   •       For  exaiiiple,   suppose  score  sets  in  k  space   are   as  follows: 

For  person  1:         2-2032       (Elevation  is  l) 
For  person  2:         0     -I;     -2     1     0       (Elevation  is  -1)  , 

Then  the  deviation  score-set  for  either  person  is  1-3     -1     2     1  , 

2  2 

For  these  people,  B  in  k  space  is  20,  but  D'   is  zeroo  We  shall  use  the  prime 

to  refer  to  measures  in  the  k-1  hyperplane. 

I'Jhen  a  profile  is  subjected  to  one  linear  constraint,  we  have  in  effect, 
projected  the  points  into  a  spp.ce  of  one  less  dimension,  which  we  refer  to  as  a 
k-1  (diiiiensional)  hyperplcne.  Ue  place  our  hyperpla^ie  through  the  origin,  per- 
pendicular to  the  direction  representing  the  elevation  f  actor «  Each  point  P  pro- 
jects into  a  new  point  (P*)  as  the  figure  shows «,  The  distance  PP'  is  the  elevation, 
and  CP'  is  the  scatter. 


\ 
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It  should  be  clear  that  k  and  k-1  spaces  yield  different  information  about 
the  similarity  of  persons,  although  measures  of    similarity  in  both  spaces  msy 
be  of  value.  Comparison  of  deviation  scores  is  most  frequently  found  in  psychology 
in  studies  of  '.lechsler-Bellevue  profiles,  vjhere  atteripts  are  made  to  interpret  the 
shape  and  variability  within  a  subject's  profile*  Burt  also  deals  with  such  devi- 
ati'on  scores  when  he  employs  covai-iances  rather  than  correlations  betifeen  persons 
to  obtain  a  matrix  which  he  then  factors  into  types  (  5)« 

Eeicre  discussing  further  the  measurement  of  dissimilarity  in  k-1  hyperplane, 
let  us  consider  how  such  score-sets  may  be  constrained  to  lie  on  a  k-2  (dimensional) 
h^rpersphere,  A  h;,-persphere  is  the  locus  in  space  of  points  all  of  which  have  the 
same  disx-ance  from  some  center.  This  geometric  property  is  imposed  whenever  all 
score-sets  are  subject  to  the  constraint  that  the  sum  of  squares  for  each  set  is  a 
constant.  But  it  miay  easily  be  seen  that  this  is  precisely  the  type  of  constraint 
which  is  imposed  by  standardizing  a  set  of  scores;  ioSo,  dividing  by  their  standard 
deviation.  Dividing  by  the  scatter  of  the  profile  has  a  similar  effect.   If  we 
divide  each  deviation  score  for  an  individusJ.  by  his  scatter,  this  results  in  a 
score  set  (x'l.)  for  which 

2 

.  2x' 
'^'  0    ji 

Since,  whenever  scores  are  constrained  as  in  a  set  of  x"  ,  the  sum  of  squares 
is  a  constant,  differences  in  scatter  ar.iong  persons  have  been  eliminated  from  con- 
sideration, just  as  differences  in  elevation  are  eliminated  when  scores  are  express- 
ed as  deviations  from  the  person's  mean.  Conversion  of  score-sets  from  deviation 

scores  to  sets  of  x"  has  projected  points  from  the  k  -  1  hyperplane  into  a  k  -  2 

3 

hyper  sphere  with  unit  radius  «j  This  is  sketched  in  the  third  panel  of  Figure  2, 
(Because  our  sketch  is  based  on  a  set  of  only  three  variates^  k  -  2  is  only  one^ 


and  the  sphere  in  this  instrnce  is  reduced  to  a  circle^.)     lie  define  the  measure  of 

dissimilarity  (D")   on  the  k  -  2  h^-persphere   as  the  distance  between  score-sets 

having  unit-scatter.     i\[e  might  have  divided  scores  hy  their  standard  deviations, 

which  would  have  placed  all  points  on   a  sphere  of  radius  sykT       Distances  on  this 

sphere  would  be   a  constant  multiple  of  corresponding  distances  on  the  unit  sphere. 

Eliminating  differences  in  scatter  from  consideration  is  widespread  in  present 

statistical  studies  of  proi:les  in  psychology.     Sometimes  this  is  done  consciously, 

as  v;hen  Stephenson  asks  subjects  to  sort  descriptive  statements  into  piles  with  a 

fiiced  nuinber  of  statements  per  pile,   so  that  the  resulting  scores  for  each  person 

have  the  seme  standard  deviation»     More  commonly,   standardization  is  introduced 

through  a  correlation  formula*     The  product-moment  formula,  for  example,   divides 

cross-products  by  the  product  of  the  standard  deviations,    and  thus  staiidardizesa 

ether  forrr.ul?^  such  as  rho,   Tau,    and  r       have  the  same  effect.     Our  diagram  shows 

ps 

how  points  P  and  Q,  which  appeared  reasonably  near  each  other  in  k  space  because 
they  are  quite  similar  in  elevation,  are  found  to  be  fairly  distant  from  each  other 
i:hen  measured  in  k  -  1  h^rperplane;  and  diametrically  placed,  i.e.,  virtually  as 
dissimilar  as  possible,  in  k  -  2  hypersphere*  Differences  of  this  sort  make  it 
operative  for  the  investigator  to  decide  on  a  rational  basis  which  type  of  score- 
set  is  to  be  his  basis  for  studying  the  relation  between  persons. 

The  k-1  hyperspheres,  which  have  not  been  used  in  psychological  work,  have 
properties  of  considerable  interest.  Such  a  distribution  of  points  is  obtained  by 
dividing  the  original  set  of  scores  for  each  person  by  the  squrre  root  of  the  s^am 
of  squares.  If  the  original  variates  are  measured  in  meaningful  units  with  an 
absolute  zero,  then  the  square  root  ox  tlie  sum  of  squares,  vjhich  represents  the 
distance  of  a  point  from  the  origin,  might  be  considered  to  be  a  measure  of  overall 
"size."   Division  by  this  measure  extends  all  points  to  unit  distance  from  the 
origin.  Tito  score  sets  which  are  in  the  same  proportion,  such  as  (ii,  8,  2)  and 
(2,  [j.,  1)  lie  on  the  same  vector  and  thus  project  to  the  same  point  on  the 
hypersphere e 
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Thus  proportional  score  sets  have  zero  dissimilarity  as  measured  on  this  hypersphere, 
just  as  geometric  figures  for  which  corresponding  sides  are  in  proportion  are  term- 
ed "similar".  Dissimilarity  may  be  measured  in  terms  of  the  distarxe  between  the 
points  on  the  unit  hypersphere  (D*)  or  in  terms  of  the  cosine  of  the  angle  between 
the  vectors. 

More  appropriate  to  p ychological  data  is  projection  onto  the  hypersphere 
with  the  centroid  of  the  population  as  center.  This  projection  is  achieved  by 
dividing  each  score  by  the  eccentricity.  Thus  differences  in  eccentricity  are  re- 
moved from  consideration.  All  persons  who  devj.a.te  in  the  sair-e  di.rv?ction  from  the 
group  average  are  projected  irto  the  same  pc.'xt  and  thus  are  considered  to  be  the 
same  "type".  The  measure  of  separation  on  this  hypersphere  is  designated  B*. 

Relations  between  measures  of  disslmiJ.arity  for  original  and  derived  5Cor<=^- 

sets.  Formula  (1)  for  D,  and  (5)  for  S,  r.re  equally  correct  vrhether  data  occupy 

the  k  space  or  are  conf  jjied  to  a  smaller  space  by  one  or  more  ccnatraints.  It  is 

of  value  to  compare  the  indices  by  examli-^jig  the  effect  of  treating  the  same  set  of 

data  successively  in  the  various  spaces,  T/Ve  begin  with  the  relation  betxveen  D  and 

D', 

2       2         2         ?- 
D»^  =  2  A  xl  =  2  A  X.  -  k  Ax. 

^^   j     "^   j     •^  (17) 

2 
The  first  member  on  the  right-hand  side  is  D  and  the  second  component  is  proportJ.on- 


al  to  the  difference  in  elevation,  /\^x, ;  i,e.. 


2     2       2 
D'   *  D'  -  k  A  ^x. 
12     12  -  ^g^ 

On  the  average  over  all  pairs, 

"~?       7  2        2 

DK,  =  2^  a.  -  2k  4_  (19) 

2 
Here,  a_  is  the  variance  of  elevation  scores,  over  the  population  of  persons, 

2      2 
g,2  ^   D'  -  k  Ax. 

Pfr.a^-kai  )  <^^^ 


J<i     ^-»"  : 
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These  relationships  between  measures  of  dissiinilarity  in  k  and  k  -  1  space 
suggest  the  possibility  of  constructing  a  new  measure  of  dissimilarity  in  which 
elevation  is  given  any  desired  weight  w«  Such  a  measure  would  allow  for  weighting 
the  elevation  and  shape  factors  to  predict  a  particular  criterion,  if  one  is  avail- 
able. It  visa   permits  reducing  the  exces^^ive  weight  the  elevation  factor  receives 
when  variatss  pre  substantially  correlated,  as  for  the  investigations  where  Morand 

found  difficulties  with  CRL»  Suppose  we  denote  the  new  measure  of  distance  by  D  • 

w 


2  2  2 

D  =       D'  +      TrJ    k   d    X, 

w 


(21) 


and 


w 


D«^  +  w  kC^^X. 
2(Z  a     +  w  k  a^   ) 


D^  -  k  (1-w)  qSc, 
2(Z  a^  -  k(l-w)  a£  ) 


(22) 


VJhen  w  is  zero,   D..  reflects  differences  in  shape   and  scatter  onlyj    as  w  approaches 

w 

1,   Dj^^  approaches  C,     Because  of  its  flexibility,   form.ula  (22)    (or  its  numerator 
aloro)   appears  to  be  the  most  suitable  basis  for  deternin ing  similaidty  of  persons, 
¥e  shell  discuss  below  some  reasons  for  this  recommendation. 

The  relationship  of  the  measure     of  distance  in  k-space  to  that  on  the  k-1 
hypersphere  msy  be  derived  from  the  lavj  of  cosines.     For  the  hypersphere  with  center 
at  the  centroid  and  radius  equal  to  unity, 

.2  ,„     „  x2 
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(23) 


where  <^E  is  the  difference  in  eccentricity  of  the  tvio  individuals.  Since  the  dis- 
tance measure  is  defined  for  a  unit  hypersphere,  the  values  of  D*  have  a  possible 
range  of  0  to  2,  regardless  of  the  number  of  variates  involved.  Thus,  when  score- 
tjets  are  divided  by  their  sum  of  squares  the  measure  of  distance  is  comparable  from 
one  set  of  variates  to  another  and  there  is  no  need  of  further  standardization  by  a 
measure  such  as  S*# 
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The  relationship  between  D'  (in  the  k-1  hyperplane)  and  D"  (k-2  hypersphere) 
is  analogous  to  that  which  held?  between  D  and  D'« 


2    ,„.    „.^2 


iy,2  ,   D^^  -  rr.;  -  E«)     „  D»^  «A^Ei 
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(2U) 


vjhere  E'   is  the  measure  of  scatter. 

D"  may  be  written  in  terms  of  the  D  measure  from  k-space   as 

E»  E» 
1     2 

This  formula  shows  clearly  what  types  of  differences  between  individuals,  reprer^^ .vi- 
ed in  the  original  data  and  in  D'^  are  discarded  uh'^a  -we  employ  only  k-2  space  ir,- 
formation.  One  of  the  subtract aci  tsrms  represents  dlCf^rences  in  elevation;  the 
other  represents  differences  i::  ?c2tter« 

Here,  again,  since  vre  h?y?  d(?i?ined  D"  as  a  measurement  on  the  unit  sphere, 
the  values  range  from  0  to  2  (  OO'"^  ^h)*     The  expected  value  of  D"  is 

T^f     -    2(l-k  4,  )    ,  (26) 

where  or,,   is  the  variance,  over  variates,  of  the  means  of  the  scores  after  dj.-" 

vision  by  the  scatter.  It  msy  be  noted  that  the  average  value  is  ?,  ?_f  »ni  o;ily 

if  all  variates  (in  derived  score  units)  ha\re  equal  -T.^r.n,5  ov3r  clX  persons.  TtSs 

2 
is  of  partic^ilp.r  interest  h^c'use  of  the  close  relotior-shiD  of  D''  to  the  corre- 
lation measure  frequently  used  to  show  similarity  bex-w^en  persons^ which  V3   now  dis- 
cuss. 

Comparison  of  measiires  in  k~2  space.  Let  us  specifically  consider  the  relation- 
ship betwaen  Q,  the  correlr:t?.on  between  persons,  and  D",  It  is  easily  shovm  that 

D"^  =  2(1  -  Q)  (27) 
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2 

Thus  it  is  evident  that  our  formulation  in  terms  of  D"  encompasses  any  results 

obtained  by  product-moment  correlation  among  profiles,  including  those  from  Stephen- 
son's forced-sort  method.  In  particular,  any  distortions  imposed  by  use  of  an 
orthogonal  model  for  correlated  variates  will  affect  studies  using  correlation 
between  persons. 


It  may  be  noted  that  average  Q  for  a  population  is  zero  when  D"^  =  2, 

But  we  have  seen  that  this  is  true  only  when  all  variates  have  equal  means.  Thus, 

if  items  of  unequal  popularity  are  chosen  for  the  sample  of  traits,  the  expected 

value  of  Q  is  greater  than  0«  Inclusion  of  some  items  on  which  members  of  the 

sample  tend  to  agree  will  increase  the  correlation  between  individuals.  Some 

implications  of  this  will  be  discussed  later. 

The  three  prominent  correlational  procedures  using  ranking  are  rho,  Tau  (23), 

and  Dul^as*  r   (13).  Rank-correlations  are  sometimes  used  in  the  belief  that  as- 
ps 

sumptions  regarding  the  test  score  metric  are  thereby  avoided. .  This  is  not  the  case 
for  rho,  I^ien  each  score  is  assigned  a  rank,  the  separation  between  two  adjacent 
ranks  is  fixed,  over  the  whole  range.  The  result  is  that  all  profiles  are  forced 
into  the  same  rectangular  distribution,  just  as  Stephenson's  forced-choice  sorting 
forces  all  profiles  into  the  same  normal  distribution.  Such  forced  distributions 
appear  to  be  fully  justified  only  if  the  investigator  regards  a  particular  dis- 
tribution as  most  likely  to  represent  the  nature  of  his  profiles.  Usually  rho  and 
product-moment  correlations  give  about  the  same  results  for  a  particular  set  of 
data, 

Kendall's  Tau  gives  values  substantially  lower  than  rho.  It  is  a  rank  corre- 
lation based  on  the  direction  of  differences  between  all  possible, pairs  of  variates, 
Tau  is  preferred  to  rho  in  some  studies  because  its  sampling  distribution  is  known. 
If  Tau  S  Tau  ,  then  rho^p>  rho  ,  in  almost  all  pairs  of  cases.  That  is  to  say, 
Tau  is  very  nearly  a  function  of  rho.  Analysis  by  Tau  will  therefore  yield  con- 
clusions very  like  those  from  rho,  and  both  of  these  will  be  reasonably  close  to 
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2 
results  from  D"     and  Q.     One  difficulty  with  Tau  is  that  the  number  of  conqsarisons 

which  must  be  made  increase  r^idly  with  the  number  of  variates. 

The  third  coefficient,  proposed  by  DuMas-;?-,  is  simple  to  compute.     Kelly  and 

Fiske  drew  our  attention  to  the  fact  that  r       is  an  approximation  of  sorts  to  Tau. 

I/here as  Tau  calls  for  considering  all  possible  pairs  of  variates,  r      uses  only 

ps 

the  adjacent  variates.  I.e.,  if  a  prof ile  is  written  in  a  certain  order  (Compu- 
tational, Scientific,  Mechanical,  •••),  r   would  consider  the  direction  of  dif- 
ference  between  Computational  and  Scientific,  and  Scientific  and  Mechanical,  but 
would  not  use  the  difference  of  Computational  and  Mechanical.  Rearranging  the 
profile  in  different  order  would  change  the  correlation,  for  different  pairs  would 
now  be  used.  If  the  arrangement  of  variates  is  a  random  selection  out  of  all  possible 

orders,  or  if  the  variates  are  uncorrelated,  r   is  an  estimate  of  Tau.  If  there 

ps 

is  any  rationale  underlying  the  arrangement,  r   is  peculiarly  biased.  Consider     , 

ps  • 

a  Wechsler  profile  of  five  verbal  and  five  performance  scores.  These  will  con- 
ventionally be  listed  in  that  order.  For  this  profile,  Tau  would  be  based  on  U5 
pairs  of  scores;  10  verbal  with  verbal,  10  performance  with  performance,  and  25 

verbal  with  performance,  r   would  use  only  nine  pairs:  1;  V  V,  U  P  P,  1  V  P. 

ps 

In  this  example,  r„„  is  determined  almost  wholly  by  the  smallest  differences  in 

ps 

scores,  which  are  least  reliable,  r   would  therefore  be  lower  than  Tau  for 

ps 

Wechsler  profiles,  and  possibly  by  a  large  amount.  Because  it  uses  relatively 
little  information  (here,  9  pairs  out  of  U5)^  r   is  expected  to  be  inexact  even 
when  it  is  unbiased. 


■K-Incidentally,  DuMas  (13)  suggests  chi-square  as  the  preferred  method  of 
estimating  similarity  where  a  more  precise  approach  is  required.  This  suggestion 
is  unsound,  since  profile  entries  are  scores  rather  than  frequencies  and  chi- 
square  cannot  be  used  with  such  data. 
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Basic  Decisions  in  Profile  Comp arisen 
The  comparison  of  two  profiles  will  give  different  results,  depending  upon 
the  investigator's  choices  at  several  points  in  the  planning  of  the  investigation. 
During  the  development  of  techniques  of  Q  correlation,  there  has  been  some  con- 
fusion and  dispute  regarding  these  matters,  but  at  this  point,  Burt  and  Stephenson, 
at  least,  seem  to  be  in  agreement  on  the  principles  underlying  the  method.  Many 
of  the  issues  have  been  discussed  with  exceptional  soundness  by  Burt,  in  The  Factors 
of  the  Mind,  Chapters  Vi  and  XI  C^df*       ^7   one  who  proposes  to  study  relations  be- 
tween persons  by  Q  correlation  or  other  measures  should  examine  Chapter  VI  with 
care.   Although  Burt  discusses  specifically  the  use  of  Q  correlation  in  factor 
analysis,  the  same  questions  regarding  metric  and  domain  apply  to  any  descriptive 
studies  of  relation  between  persons. 

The  investigator  must  define  a  trait-domain  within  which  similarity  is  to  be 
investigated.  There  is  a  certain  amount  of  loose  thinking  regarding  the  concept  of 
similarity  of  persons  which  occasionally  leads  investigators  to  regard  their  studies 
as  an  attempt  to  determine  which  persons  are  generally  similar.  Such  views  are 
encouraged  by  occasional  references  to  Q- technique  as  a  method  for  studying  "the 
whole  personality".  Actually,  the  investigator,  either  by  plan  or  by  the  necessary 
j.jjriitations  of  any  instrument  can  study  only  a         relatively  limited  segment 
cf  the  person,  and  it  will  be  noted  that  Stephenson  himself  now  places  great  em- 
phasis on  the  proper  definition  of  the  segment  of  personality  to  be  investigated. 

The  investigator  defines  the  domain  where  he  is  seeking  to  investigate  sim- 
larity  by  four  choices: 

1,  He  chooses  the  set  of  variates. 

2,  He  chooses  a  metric  for  each  variate. 

3,  He  assigns  equal  or  differential  weights  to  each  variate. 

k»     He  decides  to  study  similarity  in  k  space  or  in  some  restricted 
portion  of  the  k  space. 
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The  investigator  can  make  each  of  these  decisions  quite  arbitrarily,  but  he 
is  more  likely  to  arrive  at  useful  and  scientifically  meaningful  results  if  he  has 
a  carefully-considered  reason  for  each  decision.  Different  decisions  will  be 
arrived  at  in  different  problems.  In  the  use  of  objectively  measured  variates, 
such  as  are  used  in  anthropometric  studies,  the  appropriate  decision  may  be  dif- 
ferent from  the  decision  reached  in  designing  a  study  of  subjective  estimates  of 
personality'-.  This  is  a  departure  from  Stephenson's  view,    since  he  always  em- 
ploys variates  restricted  to  k  -  2  space. 

Choice  of  variates»  Similarity  is  always  similarity  in   some  respects.  If  we 
know  that  two  people  are  quite  similar  in  ten  different  characteristics,  we  cannot 
infer  that,  on  some  other  set  of  characteristics  uncorrelated  with  the  first  ones, 
they  will  be  any  more  similar  than  randomly  selected  people.  Since  the  number  of 
characteristics  i^hich  might  be  the  object  of  stucfy  is  essentially  unlimited,  it  is 
reasonable  to  expect  that  people  who  are  similar  in  one  respect  will  be  quite  dis- 
similar in  some  other  domains  of  behavior.  The  domain  to  be  studied  will  have  to 
be  selected  with  care.^  One  group  of  qualities  especially  promising  for  investi- 
ro:':.±ons   of  similarity  are  pervasive  and  general  variables  which  affect  performance 
ir  many  situations;  examples  are  general  mental  ability,  cultural  background,  and 
s^.ie  of  the  comraonly  identified  personality  traits.  Another  type  of  variate  which 
r:>v  be  profitably  used  is  the  more  specific  qualities  which  seem  likely  to  be  as- 
C'.:- 'elated  irith  some  criterion  performance  with  which  the  experimenter  is  concerned. 
I'or  example,  in  study  of  performance  of  a  military  group,  it  might  be  appropriate 


-"-This  comment  also  applies  to  measures  of  "empathy"  or  "diagnostic  accuracy." 
There  is  little  evidence  that  the  person  who  is  able  to  judge  one  quality  is  also 
a  superior  judge  of  other  qualities.   (  8 ) 
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to  determine  the  siiaiiarity  oi  members'    attitudes  toward  being  in  military  ser^vice, 
or  attitudes  toward  military  discipline.     Having  defined  a  domain  of  traits  in 
which  he  is  interested,    the  investigator  might,   in  theory,   dravj  a  random  sar-ple  of 
traits  to  be     measured.     Ke  obtains  much  greater  control  over  his  investigation  if 
he  uses   a  plaimed  or  stratified  sample  in  which  deliberately  chosen  chsractoiistics 
are  measured  as  reliably  a,T  possible.     Such  a  procedure  is  exemplified  in  StephenrorJs 
recent  use  of  a  "factorial  design"  for  selecting  variates   (3ii)» 

In  general,   the  more  frequently  a  quality  is  represented  in  the  set  of 
variates,   the  more  weight  it  has  in  the  similarity  measure.     If  items   are  gi-cupcd 
into  unc  or  related  subtests,   each  of  which  has  knoTjn  variar.re,   the  vari  mce  'jf  the 
subtest  indicates  its  relative  weight  in  the  total.     It  ip  therefore  n?y:.'o;>i*ic:t>6  to 
include  a  greater  nuni^er  o-f  items  dealing  with  qualities  wliioh  r^.r.ri  ^rp3ji.?I]7   'jCi- 
port ant  for  the  investigation. 

Choices  regarding  _me^ric«     All  psychometric  studiea  rnu^it  xy-ikf'i  Frvre   rvjr^-jnrr.tton 
regarding  the  metric  or  scale  units  in  which  the  variates  ?.ra  m3:-.':iir-=;c'.r     LXv^opt  in 
very  liMted  problems,  psyt^hologists   and  educators  he-Te  la.ckL»o.  fcj'.c^  wiv-h  f.^cjul 
units,   or  scales  in  which  a  unit  on  one  scale  is  exactly  co.av^p.-r^bLc  to   a  .'iLit  c-n  the 
other  scale,   representing  equal  amounts  of  the  propert.ies  b?:'n.^  T,'3ur:.->ire.d,.     Jn  fsot,    • 
it  is  doubtful  whether  comparability  between  scale's  can.  ever  bs  eGt&b?.ii?hpd  savo  zy  ■ 
arbitrary  assumption.     Yet,    any  study  of  sjjnilarity  ox   p erg. in?  de^^nd-:   fx,--.8v:rpti^^nn  of 
comparable  units  along  a  s-cale   and  between     scal_es_j 

The  investigator  must  choose  for  each  variate  a  scaSe  cucb  that  h-   j-af:ardG  on^e 
unit  as  representing  the  same   airioimt  of  the  property  at  c']!  points  cf  the  scalo^     If' 
the  investigator  does  not  regard  this   assumption  as  valid  for.    ^^oia  .~.if].3,    "lo   '^hctld 
transform  the  units  to  a  scale  he  regards   as  more  near-ly  line-stc  with  i'93p*Dot  to   the 
property  measured.     In  most  psychological  studies,   so  m\ich  error  is  present  to  ci^- 
scure  relatioriships  t,hat  jE'siiare  to  obtain  a  sga^  of  eq^ai  ixkt^i^^f^^.  will  have  ^it-tie 
effect  upon  the  conelusions.     VJhen  studies  turn  to  more  precisj^iy  measured  vaar^iables 
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variables  than  present  psychological  tests   afford,   this  question  becomes  more  , 

crucial. 

The  second  assumption  is  peculiar  to  profile  analysis:     the  investigator  must 
assume  that  one  unit  represents  the  same  degree  of  similarity  on  all  variates. 
Comparability  of  metrics  is  unlikely  to  be  testable  in  most  problems.     The  as- 
sumption enters  research  on  similarity  because  a  two-point  difference  in  Block  De- 
sign,  for  instance,  has  the  same  effect  on  the  index  of  profile  similarity  as  does 
a  two-point  difference  in  Arithmetic #     The  use  of  standard  scores  is  only  a  device 
to  improve  on  manifestly  non-comparable  raw  score  units 5   the  new  ur.lts  may  also 
lack  perfect  compar?bJlity,    and  to  that  degree  studies  of  profile  simil?:''ity  con- 
tain error.     The  irr/er^bj  gator  may  modify  the  units  to  make  thoTi  more  crmodrable 
to  one   another,  in  whatever  respect  concerns  him.     If  he  regards  one  mea<=!iire  as 
more  important  tb?n  anoth'?.r  in  an  overall  estimate  of  similai'ity,  he  nay  deliberate- 
ly assign  larger  units  tr>  that  variate* 

When  an  invesbir^ation  vlcaLs  with  objectively  measured  v?riates,   such  as  physi- 
cal measures  or  test  scores,   the  metric  is  altered  by  standardizing,  weighting^    and 
other  transformations.     We  can  illustrate  choice  of   a  mstric  by  referring  to  the 
Rorschach  M  src-i-e^^     W^en  an  investigator  uses  raw  scor?  ■'jnit?;,   he  ?5  couDting  the 
difference  frrra  '^.er^o  M  to  3  H  a^>  equal  to  a  difference  fro^  ?C  M  +c  33  I'..     If  he 
normalizes  the  score,  he  will  weight  the  fonner     difference  mrre  heavily  because 
the  raw-score  distribution  is  skewed.     Normalizing  would  be   ?:lvj sable  if,   on  psycho- 
lo;;;icaL  grounds,   the  inve'^tigatcr  :^egards  the  differoncq  from  0  to  3   as  mere  im- 
portrnt  than  the  second  difforencer,     No  general  recomirondation  can  be  made     as  to 
whether  a  variate  distribution  should  be  normalized  or  not. 

The  variance  of  each  cbar3cteristic  over  persons,    ?nd.  therefore  its  influence 
on  the  D  mea.-^ure,  T-rill  be  debe^nliied  by  the  choice  oi'  Uiirltj,,     Zi  variates  dre  ex- 
pressed in  standard  units,   each  variate  is   assigned  equal  weight.     Now  sometimes  this 
is  quite   appropriate;   it  is  common  in  identifying  physical  tjrpes  to  express  length  of 
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•    nose  and  length  of  limb  in  standard  units  so  that  they  make  equal  contributions, 
Conceivabily,   in  a  study  of  resemblance  in  appearance^    such  equal  variation  would  be 
in^propriate.     Does   a  person  T.ho  d=;p?-rto  iroii''.  tlie  avercige  by  one  'jtancJard  .'.eviction 
in  length  of  eye-bro\^  seem  as  distinctive   p.s  one  vjho  departs  by  one  3:>  d:>  in  D.angth 
of  nose?     The  selection  of  weights  is  ordlnarilv  arbitrory,    and  equal  ^/eights   (ire#, 
standardization  of  variates)  is  often  the  bent  arbitrary'"  cho-'.ce-^     Jf   a  c-rlteri^n  .'.s 
available,   optimal  predictive  veif^htc  ^.17,'"  be  nelecl-.ede     Tne  discrjjninri-it  funotion  iis 
a  device  for  weighting  variates  to  Tnaxirjl'7;e  separation  between  criterion  gro'^os. 

When  the  measurements  are  subiective,   the  chojce  of  metric  TD3''Poent':  -rurther 
difficulties,     Subjecti-'-e  rr.tin.^s   ave  ^is^.4  .in  r'cr.dje^  of  esthetic  j.-r^rc-ronc-,'-,   ?jn 
self -ratings,   or  in  ratings  of  other  nerncnsr,     Oi'.e  msj  standr^rdiz.^?   the  ratings  ?.s- 
signed  on  each  variate,  but  this  assumes  that  the  stimuli  judged  are  equally  variable 
on  each  quality.     Perhaps  it  is  rncre  rcascnable  to  suppooe,   for  ex??Tj]e.   that  pupils 
vary  much  more  in  soc?  abiD.ity  than  in  v-'-bedience,     Ratings  of  djlferen-'j  qualj.ties 
can  sometimes  be  made  more  cor.iparab.'Le  oy  clefj.nin^  ■'..he  points   p.lcng  tje  rating  scaJe 
explicitly.     Sometimes  the  ratings  by  a  person  can  be  expressed  in  terrns  of  his 
j,n«d.     Sometimes  one  can  accept  the  rating  sca3.e   as  a  scale  of  pni;a],-?ppearing  in- 
ten''als.     After  ccmp arable  subjective  judgments  on  the  several  v?rD  ?tes   ^ro  oVcffi.ned, 
differential  weight-i   according  to  supposed  njnport?nce  ma3''  be   ar.sign?d  if  desired**. 

.    .Inclusion  of  elevation  in_  the  difference  measure.     The  dcnain  is  farther  defj.ned 
by  the  decision  to  use  k,   k  -  1,   or  k  --2  Fpp.c^.r     Elevrition  is  defined  by  the   rverage . 
of  a  person's  scores.     It  hD5   an  obvious  me.^ni'^g  in  the  'fechs?.er  test,  Tjhere  els- 
vation  is  essentially  an  overall  measurs  of   ability.     In  a  Por3ch3ch  score-set,   ele~ 

.     1  .  -  *      '   ■ 

vation  represents  responsiveness,  being  highl?/  correlated  with  total  R,  Ko]zinger  _;.  .  .,. 
(21)  has  demonstrated  that  the  average  of -scores  is  heavily  loaded  with  the  first  ■ .'_  '/.• 
principal  component  of  the  scores,  i.e.,  with  the  general  factor  or  other  .frequently;  ;•'. 
represented  factor,  Thus^  if  scores  are  ^.correlated,  elevation  represents  the  comiroii  ;^:". 
thread  among  theiu.  On  the"  other  hand,  if  scores  have  low  correlations,^  the  elevation'".. 


'■-■'r'.l: 


cit  es  ."[^eis  v/oxd-iivo 


■■;  T;:,<'. 


i:>ilG. 


•^fd^Hs 


^   a":;' 


[\.}B^    ilO    i;' 


■.'j'l  ax  :  csrrr^b  ad! 


^■v£o8      »l»^:i. 


h:   '^o  '^•X'J'R.''-' 


.sxjn^ 


fioi: 


'qjo&  i'Jxir  bf^inlaiJ 


.arVJTt 


\r.C.Ti"t^. 


-^1  /"'ntir  ■  ,r,?ts  r^-riO''»  --^t-     asTcooe  1.:  c- 


^onse^rrqsn: 


■^Id-'i'i    • 


Vt-i-     >_.- 


-.  -^^  ■.  —  ;.  ■ 


~        1 


^.j^i' 


score  represents  a  mixture  of  factors  and  has  no  interpret  able'  significance.     A..     '  . 
;.:;.  ratio  based  on  the  sum  of  interit^m  covariancfes  has  been  suggested  (ll,ly  )   as  an  ,  " 
index  which  generally  reflects  the  extent  to  which  elevatnon  represents   ^.   •"   '" 

common  factor^-.  If  this     ratio  is  large,  one  can  regard  elevation  as  saturated  with 

...         .,.'■•'■  ..  .  »  .  .1  ■    . 

\;    some  psychological  quality.     If  this  ratio  la  small,  however,  elevation  lacks  % 

■.-.psychological  meaning.     In  fact,   the  elevation  component  may  be  purely  arbitrary'  if 
scores  are  uncorrclated.     For  instance,  many  person Jdity  profiles  could  be  scored      •■■ 
as  logically' if  the  direction  of  some  variables  were!  reversed,,  submission  being 
counted  instead  of  dominance y  . for ' example. "    This  "would  lovjer  the  elevation  (avers^e  - 
on  all  traits)  for  very  dominant  persons,    and  raise  it  for  submissive  ones.     Such 
reversals  do  not  alter  D  in  k  space,  but  they  do  affect  D*    and  D".     Stephenson 
attempts  to  avoid  this  problem  vrhen  he  obtains  self-descriptions  from  a  set  of 
variates  which  includes  a  statement  and  another  nearly  opposite  in  sense.     He  inight 
have  one  submissive  statement,    and  an  opposite  dominant  one.     For  such  a  balanced 
set  of  variates,   the  elevation  should  be  near  zero  for  each  person,    and  ar^r  non-zero 
elevation  score  could  be  safely  disregarded  as  due  to  inconsistency  of  response. 

Elevation  can  be  considered  a  meaningful  score  rather  than  an  arbitrary  com- 
posite only  when  variates  are  generally  correlated,   so  that  the  "positive"   direction 
of  each     can  be  deterirdJied  operationally.     When  the  elevation  score  is  interpret  able, 
one  can  decide  whether  differences  in  elevation  sho^ild  affect  D»       Sometimes  it  :'s 
wise  to  include  elevation  in  the  difference  measure   and  sometimes  it  is  unwanted^ 
Often  the  elevation  score  represents   a  response  set  (lO)  such  as  tendency  to  say 
Like  to  interest  items  j.n  general,   or  to  say  Yes  in  checking  descriptions  of 
symptoms.     Investigators  differ  in  their  judgment  as  to  whether  such  variables   are 
due  to  transient  verbal  sets  or.  are  important  aspects  of  behavior,   and,   indeed, 
response  sets  seem  to  involve  both  qualities.     If  the  investigator  wishes  to  include 
'.,  elevation,-  whatever  it  laeasures,  in  determining  differences  between  persons,  he  ; 

should  use  the  full  k-space  data.     He  may  be  well   advised^   however,   to  use   a  special 
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weight  for  elevation,  following  our  equation  (21),  since  without  this  precaution 
the  general  factor  tends. to  have  a  disproportionate  influence.  The  investigator  may 
instead  decide  to  extract  the  common  factor  and  study  similarity  in  elevation 
separately  from  similarity  in  profile  shape.  If  he  decides  to  discard  elevation  or 
to  study  it  separately,  he  will  then  go  on  to  con^jute  distances  in  k  -  1  space  (or, 
for  reasons  to  be  discussed,  in  k  -  2  space). 

Cattell  and  DuMas  have  argued  that  elimination  of  elevation  is  always  question- 
able. For  many  studj.es,  it  is  surely  valueless  to  say  that  two  people  are  similar 
in  profile  shape  but  not  in  elevation.  For  example,  "Vocabulary  higher  than  Digit 
Span"  means  something  qualitatively  different  for  a  college  graduate  with  IQ  120 
from  what  it  means  for  a  ten-year  old  of  IQ  100  or  an  adult  of  IQ  80. 

The  elimination  of  elevation,  moreover,  eliminates  what  is  often  the  more  re- 
liable information  in  the  score  sets,  and  differences  from  test  to  test  within  a 
profile  of  deviate  scores  may  be  extremely  unreliable  and  therefore  a  poor  basis 
for  investigations.  This  difficulty  is  especially  to  be  expected  -srtien  variates  are 
highly  correlated. 

Our  own  conclusion  is  thats 

1.  Elevation  should  be  included  in  the  distance  measure  with  a  deliberately 
chosen  weight  if  to  do  so  makes  similarity  a  more  interpre table  property. 

2.  Elevation  should  be  eliminated  from  the  distance  measure  only  when 
the  investigator  decides  that  the  average  is  saturated  with  a  quality 
he  desires  to  exclude  from  the  domain  in  which  similarity  is  measured. 

It  is  of  interest  to  note  that  Ebel  (lU),  working  on  the  related  problem  of 
reliability  of  ratings  (which  deails  with  similarity  of  score-sets)  arrives  at  a 
similar  recommendation.  In  his  problem,  the  mean  level  of  ratings  assigned  by  each 
rater  is  comparable  to  our  "elevation",  and  he  lists  practical  considerations  which 
make  it  wise  at  some  times,  and  unwise  at  others,  to  consider  differences  in  level 
in  assessing  agreement  between  sets  of  ratings. 
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Transformation  to  k  -  1  sphere.  The  projection  which  eliminates  differences 
in  eccentricity  from  consideration  and  places  all  points  on  a  k  -  1  hypersphere 
may  or  may  not  have  practical  value.  In  studies  where  configurations  having  geo- 
metric similarity  are  thought  of  as  representing  similar  types,  D*  appears  to  de- 
serve consideration.  Such  a  problem  is  likely  to  be  encountered  in  work  with  bocfy- 
types  or  other  physical  measurement,  where  concern  is  literally  with  shspe  rather 
than  with  size. 

Measurement  of  shape  by  projecting  onto  a  sphere  with  the  population  centroid 

« 

as  center  to  obtain  D'  likewise  has  possible  interest.  Unlike  measurement  of  shape 
by  D*  in  the  k  -  1  plane,  D'  is  invariant  no  matter  which  end  of  a  dimension  is 
taken  as  the  positive  direction,  ide   msy  think  of  a  person  as  having  a  factor 
specification  equation,  just  as  a  test  can  be  specified  in  terms  of  reference 
factors.  The  specification  for  the  person  tells  what  factors  account  for  his 
deviation  from  the  mean.  Since  D*  treats  as  identical  people  who  have  the  same 
factorial  specification,  no  matter  how  far  they  deviate,  it  may  be  the  sppro^rl^te 
measure  for  some  type -theories.  The  limitations  upon  interpretation  of  D',  however, 
include  the  serious  difficulties  which  we  discuss  below  in  connection  with  k  -  2 
space. 

Considerations  in  using  k  -  2  sphere.  The  treatments  in  k  -  2  space  will  be 
discussed  at  length,  because  such  procedures  are  especially  common.  Projection  on- 
to the  k  -  2  sphere  treats  as  identical  those  profiles  which  are  proportional  when 
expressed  as  deviations  from  the  person's  elevation.  For  example,  D"  would  be  0, 
and  Q  would  be  1,  for  this  pair  of  score-sets: 

3  1  0  U  (Elevation  =  2;   deviation  profile  is  1  -1  -2  2) 
1-3-5  3  (Elevation  «  -1;  deviation  profile  is  2  -2  -U  k) 
Those  profiles  having  small  scatter  are  magnified  in  projection  onto  the  sphere, 
(or  we  might  say  that  those  having  large  scatter  are  diminished  proportionately). 
Figure  3  draws  attention  to  some  consequences. 
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We  note  that  differences  between  persons  near  the  center  of  the  sphere  are 

much  magnified.  The  small  D'   becomes  a  large  D''  ^,  but  D*'  ,  is  little  greater 

12  12       3li 

than  D'  •  Points  1  and  2  represent  persons  with  flat  profiles.  People  who  would 
be  judged  quite  similar  in  k  or  k  -  1  plane  are  sometimes  reported  as  markedly  dis- 
similar in  the  k  -  2  measure. 


Figure  3,  Magnification  of  distances  in  projection  onto 

sphere 


Figure  U  indicates  the  effect  of  the  projection  when  error  of  measurement  is 
involved.  Each  sketch  shows  a  set  of  obtained  measurements  such  as  might  be  obtained 
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Figure  i;*  iSffect  of  error  and  scatter  oh  the  projection 

onto  a  sphere 
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on  repeated  testing  of  one  person,  assuming  that  his  error  variance  over  trials  is 
equal  for  all  variates,  and  that  errors  are  independent  in  k  space.  We  show  three 
cases 5  low  scatter  low  error,  low  scatter  moderate  error,  and  high  scatter  moderate 
error.  The  second  circle  makes  it  clear  that  if  a  profile  has  small  scatter,  even 
a  small  amount  of  error  may  cause  a  drastic  variation  in  the  person's  position  in 
k  -  2  space,  i  person  for  whom  the  variates  are  truly  equal  would  fall  at  C  in  the 
k  -  1  plane.  On  different  trials  he  would  have  an  equal  probability  of  f cOJLing  any- 
where on  the  sphere,  and  might  at  different  times  take  diametrically  opposite  po« 
sitions.  The  implication  is  that  the  position  of  some  persons  in  k  -  2  space  will 
be  far  more  variable  than  others,  and  that  such  methods  as  D",  Q,  rho,  and  Tau  will 
give  unreliable  similarity  measures  for  persons  with  rather  flat  profiles.  This 
is  an  expression,  in  other  terms,  of  the  sometimes-neglected  principle  that  dif- 
ferences betv/een  two  variates  within  a  profile  cannot  be  interpreted  with  confidence 
unless  the  original  variates  are  reliable  and  not  saturated  with  a  common  factor  (?5^)# 
If  the  conventional  assumption  that  error  of  measurement  is  eqp.al  for  all  persons 
is  ^proximately  true  for  the  original  variates,  and  if  flat  profiles  in  k  space  can 
be  expected,  the  assumption  of  equal  error  is  not  even  approximately  true  for 
measures  of  people's  positions  in  k  -  2  space. 

Stanley  (33)  has  provided  some  data  which  confirm  our  analysis.  He  admir:  stor- 
ed the  Allport-Vemon  Study  of  Values  twice,  and  correlated  the  two  profiles.  Tnis 
correlation  is  a  measure  of  distance  between  the  two  profiles  in  k  -  2  space.  For 
each  person,  he  had  a  correlation  and  also  a  measure  of  scatter  within  the  profile; 
these  two  correlated  .38  over  all  persons,  the  greater  scatter  being  associated  with 
the  greater  reliability. 

The  question  m\ist  now  be  raised  whether  the  stu(fy  of  profiles  in  k  -  2  space, 
or  more  specifically,  whether  correlation  between  profiles  in  the  usual  manner, is  a 
justifiable  line  of  investigation.  If  the  removal  of  the  first  factor  and  the  magni- 
fication of  error  variance  when  scatter  is  equated  are  both  disadvantageous,  is  a 
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procedure  which  involves  both  of  these  worth  further  consideration?    Knowing,  how- 
ever, that  Stephenson  and  othisrs  using  his  techniques  have  obtained  significant  re- 
sults, ve  cannot  dismiss  the  method  until  we  determine  why  the  faults  we  suspect 
have  not  interfered  too  drastically  with  their  investigations.  The  explanation  seems 
to  be  that  there  are  conditions  where  k  -  2  space  data  give  useful  and  not-unduly 
inaccurate  results. 

Consider  first  the  question  of  removal  of  the  first  principal  component,  as  is 
done  when  deviate  scores  are  obtained.  This  projects  a  distribution  of  points  in  k 
space  into  a  k  -  1  space,  and  in  so  doing  removes  the  variance  due  to  elevation.  The 
same  elimination  of  elevation  is  accomplished  by  the  forced-sort  technique.  An 
essential  condition  for  the  resulting  data  to  be  useful  is  that  the  position  in  k  -  1 
space  must  be  determined  with  substantial  reliability.  Under  what  circumstances  can 
we  e:q)ect  reliability  after  the  first  component  is  removed?  If  the  variates  are  near- 
ly uncorrelated,  each  variate  contributes  to  the  total  dispersion  of  persons  approxi- 
mately in  proportion  to  V.,  and  the  elevatidn  score  removed  constitutes  one  kth  of 

2 

the  total  variance.  The  component  removed  from  D  will  on  the  average  be  only  one 

2  2 

kth  of  the  total,  and  D*  will  be  quite  similar  to  D  •  Now  this  is  what  happens  in 

the  type  of  Q-sort  Stephenson  originally  proposed,  where  variates  were  sampledfrom 
a  heterogeneous  collection.  If  a  set  of  variates  involves  about  fifty  factors,  all 
more  or  less  equally  weighted,  removal  of  one  factor  is  not  expected  to  alter  dis- 
tances between  persons  enough  to  cloud  results.  As  more  correlated  variates  are  used, 
extracting  the  elevation  factor  does  discard  more  of  the  possibly- important  variance, 
and  the  residual  information  will  be  more  unreliable  as  a  result. 

The  second  question  relates  to  the  effect  on  reliability  of  projection  from 
k  -  1  hyperplane  to  k  -  2  hypersphere.  This  projection  leads  to  substantial  magnifi- 
cation of  error  if  a  profile  is  flat.  Recalling  that  C  represents  the  center  of  the 
sphere,  and  is  the  point  corresponding  to  a  flat  k  -  1  profile,  and  that  0'  is  the 
centroid  in  k  -  1  space,  we  can  expect  few  flat  profiles  if  the  dispersion  of  persons 


2  2 

O'P'   is  much  smaller  than  CO'  •  This  demands  that  0'  be  some  distance  from  C     ^r* 
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in  other  words,  that  the  means  for  the  k  variates  not  be  equal,  (CO'  is  the  sum  of 
squares  of  these  means).  The  more  persons  fall  close  to  Cj  the  more  will  magnifi- 
cation of  errors  for  them  obscure  results  obtained  with  k  •  2  measures.  l-Jhen  many 
variates  are  used  in  the  profile,  as  in  Stephenson's  forced-sort  method,  there  is  a 
good  chance  that  some  of  the  means  will  be  unequal,  and  flat  profiles  then  are  less 
common.  It  is  to  be  noted  that  when  original  scores  are  expressed  as  deviations 
around  the  group  mean  there  will  be  mai^  flat  profiles;  such  scores  are  badly  suited 
to  k  -  2  space  procedures.  In  general,  the  essential  condition  is  that  flat  profiles 
in  k  -  1  space  be  rare  or  absent. 

While  use  of  variates  with  unequal  means  will  reduce  the  number  of  flat  pro- 
files, this  has  the  disadvantage  that  correlations  then  tend  to  become  larger  and 
more  uniform,  so  that  one  obtains  less  information  about  differences  between  persons. 
In  the  extreme,  if  items  differ  \d.dely  in  popularity,  most  persons  will  rank  them  in 
the  saune  order  and  almost  all  Q  correlations  will  be  1,00, 

Similarity  between  individuals  or  within  a  group  can  apparently  be  given  no 
psychological  interpretation  unless  it  is  measured  in  a  domain  in  vjhich  at  least  some 
pairs  of  people     are  dissimilar.  The  similarity  index  obtained  for  any  set  of 
items  depends  to  a  major  degree  on  the  discriminating  power  of  the  items.  This  means 
that  the  absolute  magnitude  of  the  Q  correlations  cannot  be  directly  interpreted  and 
may  have  no  practical  significance  to  investigations  of  similarity.  Only  vmer.-  i\-   is 

demonstrated  that  a  difference  between  groups  or  between  pairs  of  individv.aJ-s  ii: 

exists 
wagnitude  of  correlation  /Is  it  possible  to  offer  an  interpretation,  Fiedler  (16)^ 

for  example,  asked  thers^ists  of  several  schools  to  rate  statements  describing  a 
therapeutic  relationship  in  order  to  determine  if  they  differ  along  school  lines  in 
their  concept  of  an  ideal  therapeutic  relationship.  The  correlations  between  ratings 
were  positive  and  large  (median  •6U)«  One  would  be  tempted  to  interpret  such  corre- 
lations as  indicating  a  high  degree  of  similarity  among  the  therapists  regardless  of 
scliool.  But  it  is  also  possible  that  the  statements  used  represented  such  markedly 
desirable  and  undesirable  qualities  that  high  agreement  could  be  found  in  almost  any 
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sample  of  persons  acquainted  with  therapy.  Undoubtedly  statements  about  more  de- 
batable qualities  in  the  ther^eutic  relationship  could  be  found  which  would  result 
in  much  lower  correlations  among  therapists  of  different  schools.  However,  Fiedler 
goes  on  to  extract  the  valuable  information  that  the  expert  therapists  correlate 
higher  with  one  another,  regardless  of  school,  than  they  do  with  non-expert  thera- 
pists of  the  same  school.  This  difference  supports  his  major  conclusion,  since  it 
indicates  that  the  choice  of  items  had  not  compD.etely  pre-determined  the  correlationsc 
It  is  disturbing  to  realize,  however,  that  choice  of  more  obviously  desirable  and 
undesirable  statements  might  have  resulted  in  higher  correlations  in  both  groups, 
so  that  the  differences  he  found  would  have  been  obscured.  This  demonstrates  that 
while  Q  correlations  can  be  used  to  show  the  rej.ative  similarity  of  two  pairs  of 
persons,  or  persons  in  two  groups,  little  meaning  can  be  attached  to  the  size  of  a 
Q  correlation  per  se. 

It  is  not  surprising  that  most  profile  studies  today  utilize  comparisons  in 
k  -  2  space,  since  the  problems  have  been  conceived  in  terms  of  correlation  as  used 
to  study  relationships  between  tests.  However,  it  is  questionable  whether  that  model 
is  a  particularly  good  one.  For  the  determination  of  similarity  between  two  tests, 
it  is  reasonable  to  eliminate  the  mean  and  variance  from  consideration.  As  Thomson 
05)  and  Burt  have  pointed  out,  the  test  mean  represents  its  general  level  of  dif- 
ficulty for  the  population,  while  the  variance  is  a  function  of  the  units  used.  Both 
of  these  values  are  usually  quite  arbitrary,  depending  on  the  choice  and  number  of 
items,  and  since  we  are  mainly  interested  in  the  underlying  relationship  between 
tests,  these  values  are  equated.  However,  in  dealing  with  similarity  of  individuals, 
it  is  necessary  to  consider  rather  carefully  what  is  involved  when  individuals  are 
equated  for  level  and  scatter. 

To  illustrate  the  interpretation  that  can  be  made  for  measures  in  k  or  k  -  1 
space,  which  measures  in  k  -  2  space  do  not  allow,  we  refer  to  a  study  by  Bendig  (2). 
He  asked  professors  of  psychology  to  rank  15  professional  journals  in  terms  of  their 
importance  for  study  by  graduate  students.  These  ranks  were  correlated  and  factor 
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analyzed,  leading  to  the  conclusion  that  there  were  three  bi-polar  types  of  persons, 

described  in  terms  of  (a)  interest  in  experimental  approach  to  psychopathology,  (b) 
interest  in  statistical  and  psychometric  theory,  (c)  interest  in  theory  construction 
in  clinical  area.  Suppose  Bendig  had  asked  the  Judges  to  rate  the  journals  on  some 
objectively-defined  scale  ranging,  for  example,  from  "Knowledge  of  most  contents  of 
this  journal  should  be  required  on  comprehensive  examinations"  to  "Reading  this 
journal  will  not  be  worthwhile  for  any  student".  Then  the  elevation  factor  (tendency 
to  give  many  journals  high  ratings)  would  reveal  something  about  the  judge's  view  of 
graduate  training.  A  judge  who  wants  students  to  read  many  journals  differs  from  a 
judge  who  rates  only  a  few  high,  even  though  he  gives  the  same  rank  order  to  the 
journals.  Moreover,  the  variability  of  the  ratings  by  a  judge  would  indicate  his 
tendency  to  differentiate  within  the  field  of  psychology,  regarding  some  areas  as 
worthwhile  and  some  as  trivial.  A  judge  with  a  flat  profile  would  be  reporting  that 
he  is  equally  sympathetic  to  a  wide  range  of  psychological  interest.  A  judge  with 
a  wide  variation  of  ratings  indicates  a  stronger  differentiation.  Two  judges  who 
ranked  the  journals  in  the  same  order,  but  who  differed  in  the  scatter  of  their 
ratings,  would  be  expected  to  allow  quite  different  latitude  for  students  in  train- 
ing. At  one  point,  Bendig  characterizes  his  subjects  as  arranged  from  a  "theoreti- 
cal-experimental-statistical" pole  to  a  "pr actio al-nonexperimental -intuitive"  value 
orientation.  Possibly,  rather  than  this  typology,  a  k  -  1  or  k  space  measure  would 
reveal  that  the  judges  could  be  better  grouped  in  terms  of  specialized  versus  catho- 
lic values. 

Combining  our  two  conditions,  it  appears  that  measures  in  k  -  2  space  can  give 
useful  information  only  if  the  dispersion  of  persons  in  k  -  1  space  and  also  the 
scatter  for  nearly  all  persons  are  large  relative  to  the  error  dispersion.  Data  in 
k  -  1  space  are  required  to  determine  whether  these  conditions  are  met.  Then  one  can 
determine  whether  profiles  in  k  -  1  space  are  reliable  (1^),  and  whether  there  are 
many  flat  profiles.  Moreover,  one  can  if  he  wishes  eliminate  the  people  with  flat 
r>rofiles  from  the  stucfy.  The  forced-sort  does  not  collect  k  -  1  data  and  one  has 
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no  basis  for  testing  whether  profiles  are  reliably  located.  It  seems  quite  im- 
portant for  those  studying  similarity  to  investigate  reliability  directly  by  obtain- 
ing two  estimates  for  each  profile.  Reliability  of  k  -  2  space  measures  has  ordi- 
narily not  been  examined  in  past  investigations  of  similarity. 

While  we  have  discussed  the  conditions  under  which  measures  which  force  equal 
scatter  on  all  persons  can  be  made  maxL-nally  usej^^l^  >je  d-^  not  recommend  such  pro- 
ced''ires.  Our  consideration  of  -J.l  7:ossibiliti&r  D.-^ads  r,?  to  su;:;ggst  that  the  method 
most  generally  advisable  i'>-  t',e  mos-sure  of  e^jUc'tioa  (21;  whsre  k  -  1  plane  data  are 
corabined  with  the  measure  of  elevation  using  a  deliberately  chosen  weight  (which  may 
be  zero)  for  elevation,   (When  the  weight  is  unity^  this  measure  is  the  same  as^  in 
k  space).  Excepting  treatment  of  physiological  and  anthropometric  measures,  we  know 
of  no  psychological  or  educational  problem  where  "correcting"  profiles  for  scatter  ir 
advantageous. 

In  those  studies  where  k  -  2  space  measures  have  been  used  in  the  past,  properly 
interpreted  positive  results  need  not  be  discounted.  The  faults  to  which  we  have 
dravjn  attention  operate  to  obscure  true  relations  and  to  make  the  measurement  tech- 
nique insensitive.  This  would  make  non-significant  results,  or  low  Q-correlations, 
likely  in  some  instances  where  a  better  techjiique  would  find  more  relationship.  We 
know  of  no  biassing  factor  or  systematic  error  in  these  procedures,  however,  which 
would  have  introduced  significant  apparent  relations  where  none  should  be  found. 

The  specialized  problem  of  comparing  a  person's  profile  with  his  estimated  pro- 
file introduces  an  interesting  minor  question.  Several  such  studies  are  listed  in  a 
recent  p^er  by  Brown  (U)«  The  usual  method  is  to  administer  (say)  the  Kuder  Pre- 
ference Record,  and  then  to  require  the  person  to  rank  his  interest  in  the  categories. 
The  profile  from  the  test  is  rank-correlated  with  the  estimated  profile.  But  this 
is  not  precisely  the  question  that  should  be  asked.  If  one  were  to  predict  the  in- 
terests of  the  average  man,  they  v:ould  not  all  be  equal;  on  the  contrary,  some  cate- 
gories are  generally  more  popular.  The  estimated  profile,  obtained  by  the  usual  di- 
rections, is  a  k  -  2  space  profile  based  on  the  estimated  strength  of  interests 
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relative  to  each  other.  The  test  profile  is  a  k  (or  k  -  1)  profile  based  on  the 
estimated  size  cf  the  d«3viations  of  the  person's  interests  fron  the  interests  of  the 
norm  group.  These  tuo  profiles  should  not  normally  be  highly  correlated,  because 
the  average  popularity;"  of  the  categories  has  been  nquDlized  in  the  test  profile.  To 
determine  if  people  can  estlriate  their  own  tert  profiles,  the  experiment  can  be  re- 
designed to  make  the  estim.ate  mere  like  the  tert  in  logical  structure.  Perhaps  the 
easiest  technique  vrould  be  to  ?.sk  tne  person  to  guess  his  percentile  standing  on 
each  category.  A  D  measure  (or  D  )  based  on  this  profile  would  take  into  account 
elevation  and  scatter,  and  would  correctly  compare  profiles  expressed  in  terms  of 
derived  scores. 

2 

Short-Cut  Formulas  Based  on  Mean  D 

One  use  of  measures  of  similarity  is  to  con^jare  any  two  persons.  In  research, 
hovjever,  the  questions  more  often  relate  to  the  similarity  of  two  groups,  or  the 
homogeneity  of  some  particular  group.  If  questions  could  be  answered  without  com- 
puting the  measure  of  similarity  for  each  pair  of  persons  involved,  it  would  be 
possible  to  obtain  the  ans-^'ers  much  more  rapidly.  He  have  discovered  several 
formulas  based  on  u   which  relate  to  such  inquiries.  Unfortunately,  however,  they 
are  based  on  the  average  of  D  squared  for  a  set  of  pairs,  and  there  seem  to  be  no 

similarly  helpful  spprca.ches  for  obtaining  the  average  of  D  directly.  We  have  in- 

2 
dicated  earlier  the  difficulties  which  make  D  inappropriate  as  an  interval  scale 

to  measure  distance.  The  following  formulas  are  presented  for  three  purposes.  They 

may  be  employed  as  a  first  rapid  way  of  ansx^rering  questions  about  groups,  provided 

the  investigator  recognizes  that  different  results  irdght  be  obtained  if  mean  D  or 

2 
median  D  had  been  determined  instead  of  mean  D  •  A  second  value  of  the  formulas 

is  that  they  provide  insight  into  the  nature  of  distance  measiires.  Factors  which 

2 
increase  mean  D  will  also,  in  general,  increase  mewi  D  and  medinn  D,  even  though 

not  in  the  same  amount.  The  third  use  is  for  checking  computations. 
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Average  similarity  wj-^hin  group«     It  was  previously  noted  that  in   any  group. 


"■'■  iJ  -  1     ^     ^ 

where  i   and  i'  vsry  over  all  persons  in  the  s.3niple.     This  index  expresses  the   averap:c 
similarity  of  a  group,   i.e.,   its  homogeneity,   except  that  it  gives  greater  weight 
to  large  distances  thin  wou'J.d  a  linear  measure  of  distance.     This  foinula  might  be 
used  to  compare  the  homogeneity  of  one  group  with  that  of  another,    as  in  an  in- 
spection of   a  grouping  of  persons  into  postulated  "t^/pes". 

If  CL  is  the  centroid  of  the  sample  in  the  space  under  analysis   (whether  it  is 
the  center  of  the  reference  class  or  not),    and  i  varies  over  the  sample. 


E^     oroT^     =     1      rT7|  (29) 

This  is  the  mean  second  moment  of  persons   about  the  centroid,    and  is   analogous  to  a 
variance  for  the  distribution.     It  is  not  mathematically  a  vsrisnce,   hovxever,   since 

9 

the  mean  E  is  greater  than  zero,  "ie  have  referred  to  E"  as  a  measure  of  dispersion. 
It  will  be  noted  that  if  points  are  di.stributed  on  a  hypersphere,  0^  lies  within 
the  hypersphere,  and  no  one  can  fall  at  the  centroid  of  the  sample. 

Formulas  comparable  to  the  above  can  easily  be  written  for  S  ,  and  for 
measures  in  which  weights  are  assigned  to  the  variates. 

Similarity  of  person  to  group,  Fqr  a  single  person,  it  rnry  be  interesting  to 
know  his  average  distance  from  all  other  members  in  a  group.  If  i  is  a  member  of 
Group  Y, 


11'     K  -  1  ^  f  1      j'     M  -  1  11    2   ^ 

(i  =  1,  i'  =  2,  3,  ...  N)  (30) 


Here  0  P  ^  is  2  (x..  .^     f       . 


-U7- 
Dj  is  the  average  similarity  within  Group  Y, 

If  i  is  not  a  member  of  Group  Y, 


1)2   ^  0  p2  +  1  nl  (i  not  in  Y;  .i»  -  1,  2,  ..^  N)     (31) 


11 


Yi     2  Y 


The  difference  between  (30)  and  (31)  is  due  to  the  inclusion  of  i  in  Y  in  the  first 
case.  As  N  increases,  (30)  approaches  (31) •  As  before,  one  must  bear  in  mind  that 
we  have  averaged  the  squared  distarces. 

Distance  between  ;?;roupSo  The  measure  of  similarity  between  two  groups  might 
be  found  in  0  0„,  the  separation  of  their  centroids.  This  is  the  measure  most  used 

in  comparison  of  groups  to  test  the  null  hypothesis,  but  we  sometimes  de5:ire  to  de- 

2 
termine  instead  the  average  D  between  members  of  the  two  groups.  It  permits  us  to 

ask  whether  a  group  resembles  another  group  as  closely  as  members  within  the  group 

resemble  each  other.  For  this  we  have 


D^     =   n  p2   J.   n  tj'^ 


11 


.  '  Vi     ViT  "  Vz   (i  =  ^^  2,  ...  N  ;  i'  =  1,  2,  ...N  )  (32) 


Here  we  see  the   average  cross-similarity  as  made  up  of  three  components:     squared 
distance  between  group  means,   dispersion  within  the  first  group,    and  dispersion 

within  the  second  group. 

2 

The  formula  can  be  rewritten  as  follows,  if  o .         is  the  variance  of  j  for 

0(Y) 
the  population  Y  represents,  etc.: 


^ii«  =  ?  (^irv^  ♦  "="1(7^  "   (^  (Y^   "  ^.    )^  )  •  (33) 

^^     j   J(Y)    J(Z)     J.U;    J. (2) 

There  is  one  term  for  the  variance  within  each  group,  and  one  which  is  twice  the 
variance  between  groups. 

Summary  and  Recommendations 
Studies  attempting  to  determine  the  similarity  of  persons  have  used  a  variety 
of  statistical  procedures.  Some  of  these  procedures  are  more  advantageous  than 
others,  and  we  have  attempted  to  analyze  each  procedure  so  that  investigators  can 
choose  the  method  most  likely  to  reveal  the  effects  they  seek  to  measure. 
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Each  procedure  for  determining  the  similarity  of  two  score-sets  in  effect  de- 
tennines  the  distance  between  two  points  in  a  space  defined  by  the  variates.  The 
decisions  facing  the  investigator,  which  determine  what  results  he  will  obtain,  are: 
choice  of  variates,  choice  of  metric  for  each  variate,  assignment  of  weights  to 
variates,  and  choice  of  index  of  similarity.  The  choices  made  define  the  domain 
within  which  similarity  is  to  be  determined. 

For  profile  comparison  it  is  necessary  to  express  each  variate  in  a  scale  of 
equal  units,  such  that  units  on  the  several  scales  are  comparable.  Unless  the  in- 
vestigator is  satisfied  to  assume  that  his  units  do  possess  these  qualities,  he 
should  transform  his  variates  or  assign  differential  weights  to  them  in  order  to 
get  units  he  regards  as  con^arable. 

An  index  may  be  based  on  either  an  orthogonal  or  an  oblique  model,  the  latter 
taking  into  account  the  correlation  among  variates.  All  indices  of  similarity  in 
general  use  in  psychological  studies  are  based  on  the  orthogonal  model.  We  propose 
an  index  D  of  this  type,  and  a  standardized  index  S  which  has  similar  properties. 
An  oblique  model  treated  by  Mahalanobis  leads  to  the  index  D,   which  is  especially 
suited  to  classification  problems  yrhere   groups  are  defined  a  priori.  It  is  closely 
related  to  the  Hotelling  T  and  the  discrimnant  function.  Our  conroarison  shows  that 
D  gives  more  weight  to  common  factors  among  the  variates  than  ID.  As  a  consequence, 
D  is  more  reliable  from  trial  to  trial,  and  more  stable  from  one  sample  of  variates 
to  another.  If  variates  have  little  correlations,  D  approaches  ID .  In  general,  for 
descriptive  purposes,  the  index  D  based  on  the  orthogonal  model  seems  superior  to 
ID  because  of  its  greater  stability.  D,  however,  when  applied  to  correlated  variates, 
has  certain  distorting  properties  which  cause  factors  to  have  greater  weight  in  some 
pairs  of  persons  than  in  others,  and  its  interpretation  is  unclear. 

If  it  is  desirable  to  take  correlation  into  account,  the  practical  procedure 
is  to  transform  the  correlated  variates  to  an  uncorrelated  set,  and  apply  the  ortho- 
gonal model.  If  possible,  it  is  wise  to  begin  with  a  set  of  nearly  uncorrelated 
variates,  each  reliably  measured,  or  with  a  set  of  variates  having  only  one  general 
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("elevation")  factor-. 

The  various  orthogonal  indices  nay  be  classified  as  follows: 

k      space  measures,  which  reflect  differences  in  profile  shape, 
elevation,  and  scatter.  These  include  the  .Pleasure  D  or 
S  which  we  describe,  C at tell 's  r  ,  and  one  form  of  Pearson's 
CRL,  P 

k  -  1   hjrperplane  measures,  which  remove  differences  in  elevation 

from  the  data  before  comparison  of  profiles.  The  index  D' 

is  used  for  such  dstae  I   special  index  D^  is  suggested 

which  permits  the  investigator  to  reintroduce  the  elevation 
factor  x^)ith  any  desired  x-jeighto 

k  -  1  hypersphere  measures,  which  remove  differences  in  eccen- 
tricity of  profiles.  Measures  in  this  group  are  chiefly 
of  theoretical  interest. 

k  -  2   hjTDersphere  measures,  which  remove  differences  in  elevation 
and  scatter  from  the  profile.  These  include  product- 
moment  correlation,  rho,  Tau,  r  ,  and  correlation  based 
on  Stephenson's  forced-sort  procedure. 

The  investigator  should  eliminate  elevation  and  scatter  from  his  distance 
measure  only  if  there  is  a  psychological  reason  for  regarding  differences  in  these 
as  unimportant.  For  most  purposes,  we  regard  the  index  IL.  as  best  suited  to  simi- 
larity studies,  IJhen  w  is  one,  this  becomes  Do  If  D  or  D  is  used,  the  investi- 
gator  treats  as  alike  those  people  who  have  the  same  profile,  but  considers  that 
profiles  having  different  elevation  or  different  scatter  are  as  truly  different 
as  profiles  having  different  high  or  low  points.  In  contrast,  measures  in  k  -  1 
space  (based  on  deviations  around  the  person's  mean)  and  measures  in  k  -  2  space 
(with  scores  standardized  in  each  profile),  discard  some  of  the  most  reliable  in- 
formation in  the  score  set.  Profiles  in  k  -  1  space  are  less  reliably  determined 
than  k-space  profiles.  In  going  to  k  -  2  space,  error  is  greatly  magnified  for 
persons  i-rith  small  scatter.  Such  magnified  errors  are  likely  to  obscure  true  re- 
lationships. 

Most  investigations  have  been  based  on  k  -  2  space  measureso  Ue  do  not 
believe  that  such  indices  are  generally  the  best  for  research  on  similarity.  It 
is  true  that  some  studies  have  successfu].ly  discovered  relationships  with  these 
measures.  Measures  in  k  -  2  space  can  be  dependable  when  variates  are  reliably 
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measured,    and  where  there   are  few  f.lat  proiiles.     Even  in  studies  where  k  -  2 
measures  have  been  useful,    s.  /.lore  ^o-.:er?ul  technique  vrculd  bo  exrjected  to  produce 
the  results  with  r^reater  cl.arity*      In  studies  which  yielded  no  significant  relations 
involving  k  -  2  siiiilprity  measures,    an  indez  such  as  D    might  have  found  relation- 

'I'T 

ships  of     importance.  .  /...  „.        ,. 

In  choosing  beti:sor  D,   D',    and  D  ,   the  investigator  must  decide  whether  there 
is   an  interpretable  elevation  factor,    and  whether- this-  fjctcr  ohc^old  be   allowed  to 
influence  his  distance  measure.     If  tlie  variates  .do.  not  hiivc  substarxtial  positive 
intercorrelaticn,  Xire  recomirend  that  D,   computed  9n  the  original  measui^es  in  k  space, 
be  used  to  determine  dissirdlarity.     If  the  vrriates  do  generally  measure   a  common 
factor,   the  investigator  should  consider  the  meaning  of  this  factor  and  decide 
whether  it  is  one  he  wishes  to  count.     If  he  wishes  to  eliminRte  it  from  consider- 
ation becsuLe  he  regards  it  as  ijrrelevant  to  his  problem,  he  will  use  D*    as  his  in- 
dex.    If  ho  wishes  to  include  V.ie  factor,   he  may  choose   an  appropriate  weight  for  it 
and  use  D,,.     The   advantage  of  D     over  D  is  that     with  substantially  intercorrelated 
variates  the  elevation  factor  m.srv-  receive   greater  weight  in  D  than  it  should,   rela- 
tive to  the  xjeight  given  to  the  shape  of  the  profile* 

The  distance  index  msc-  be  expressed  in  terms  of  B,   D'^,   or  some  transformation  to 
another  sc-"de.     It  appears  unwdso  to  force  D  j.nto   a  correlation-like  index  ranging 
from  +1  to  -1  as  Catte 11  suggests*     There  is  probably  no  limit  on  hov?  dissimilar  two 
people  car:i  be,   save   as  one  is  Imposed  by  the  metiiod  of  gathering  data.     Hence  in  k 

space  or  k  -  1  h7^'perpl3ne  D  can  range  fraa  0  (perfect  similai-ity)  to  00.     If  siriii- 

^  2 

larity  is  reported  as  D'^,  we  have  useful  formulas  for  mean  D  under  various  situa- 

o 
tions,  L'*j  however,  seems  to  be  less  meaningful  thaii  D  as  a  measure  of  distance,  ex- 

pecially  as  D  is  literally  the  "distance  between  points  in  our  geometric  model.  It 

is  also  more  likely  to  h.avc  -statistical  properties  v;hich  make  it  possible  to  utilise 

"  ^  '  .  2 

means,  variances,  and  product -moment  correlatiorxS.  Thus  we  advise  that  D  be  used 

only  in  preliminary  stucdes  where  its  simplicity  is  of  value  and  where  ordering  of 
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similarity  is  the  major  question.  The  use  of  Q  as  a  measure  should  also  be  limited 

to  these  conditions. 

This  paper  has  given  little  attention  to  problems  of  reliability,  but  it  is 
clear  that  measures  of  distance  between  points  cannot  be  determined  dependably  if 
the  locations  of  the  points  are  undependable.  Therefore,  any  steps  the  investi- 
gator takes  to  make  his  profiles  more  reliable  are  well  worth  while. 

Profile  research  is  necessarily  faced  with  severe  difficulties.  The  results 
of  any  investigation  are  influenced  by  numerous  choices  which  must  be  made  in  part 
arbitrarily.   Even  when  these  decisions  are  made  wisely,  the  difficulty  of  making 
reliable,  measurements  on  many  variates  at  once  is  a  severe  one.  lie   hope  that  in 
spite  of  these  problems,  the  adoption  of  techniques  of  analysis  which  include  as 
much  information  as  the  data  permit,  and  which  do  not  introduce  additional  errors 
of  their  own,  will  permit  studies  of  similarity  to  advance  psychological  knowledge. 
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